Atrebas / atrebas.github.io

6 stars 6 forks source link

For communication #1

Open hope-data-science opened 4 years ago

hope-data-science commented 4 years ago

I've just read your work at https://atrebas.github.io/post/2019-03-03-datatable-dplyr/, that is such a good piece! I have just done a job on this topic at https://github.com/hope-data-science/tidyfst. Hope you like it and might help improve it in the future.

Thanks.

Atrebas commented 4 years ago

Hi, Thank you for your message and kudos for your work on tidyfst. It looks very interesting. What are the main differences with other packages like dtplyr or tidyfast? Cheers.

hope-data-science commented 4 years ago

Good question. I'll try to answer:

  1. More flexible verbs. dtplyr could only carry out the intersection of dplyr and data.table (actually, much less than intersection in the current state). tidyfast, on the other hand, covers mainly on tidyr functions. For tidyfst, it covers many tidy verbs such as dummy_dt, mutate_when, you might never see it anywhere but they are useful.

  2. Different APIs. It is not only different with dtplyr and tifyfast, it is different with dplyr too. I don't implement select helper like starts_with, you can use regex directly in select_dt and select_mix. While could still be improved, they are rather powerful now to select flexibly. About the nest and unnest part, tidyfst do not have hoist, the unnest_dt itself could unnest any number of columns. The longer_dt and wider_dt do not use the APIs from pivot_longer and pivot_shorter (and will not too), I've get a new way to lead learners, hope it will be simpler for more to understand.

  3. Support of fst. This is an amazing package for big data analysis. See https://hope-data-science.github.io/tidyfst/articles/example5_fst.html.

This list could go forever, but I think the point is, tidyfst keep things to be simple but not leading users to be lazy(as dplyr does), and still runs pretty fast(just like data.table). Respecting all its predecessors and companions, it has its own style to lead a tidy way for fast data manipulation. Hope it helps.

Thanks.

hope-data-science commented 4 years ago

I've add a vignette using your examples, FYI. Link: https://hope-data-science.github.io/tidyfst/articles/english_tutorial.html

Thanks.

Atrebas commented 4 years ago

Great, thanks for letting me know, and thanks for citing your source ;-) I plan to update the post in the future but didn't take the time yet. The new dplyr version (1.0.0, I think) will be out soon with some noticeable changes (e.g. across) and there are new data.table features. It may be a good way to get busy during the covid-19 lockdown... Best.

hope-data-science commented 4 years ago

My package has considered the updates of dplyr (v1.0.0), because data.table is flexible enough to do so. Looking forward to your updates so I could update this tutorial too. Thank you so much for your post, it help debug and improve the package. Waiting for data.table to update its fcase and other features too. Keep moving~

Thanks.