cardat / air-health-sws-r-targets-technique-tweaking-tinkering

air-health-sws-r-targets-technique-tweaking-tinkering
https://cardat.github.io/air-health-sws-r-targets-technique-tweaking-tinkering/
MIT License
0 stars 1 forks source link

Recommendation: use data.table NOT dplyr and tidyverse #2

Closed ivanhanigan closed 1 year ago

ivanhanigan commented 1 year ago

@lucashertzog

I make this strong recommendation.

Readings: https://r4ds.had.co.nz/introduction.html?q=data.table#big-data

dplyr and the tidyverse focuses on small, in-memory datasets. This is the right place to start because you can’t tackle big data unless you have experience with small data. The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care you can typically use them to work with 1-2 Gb of data. If you’re routinely working with larger data (10-100 Gb, say), you should learn more about data.table. This book doesn’t teach data.table because it has a very concise interface which makes it harder to learn since it offers fewer linguistic cues. But if you’re working with large data, the performance payoff is worth the extra effort required to learn it.

https://github.com/matloff/TidyverseSkeptic

Everything in the Skeptic is spot on. Agree agree agree

ivanhanigan commented 1 year ago

Closing this as no further action required. I hope you have bookmarked these readings @lucashertzog