Recommendation: use data.table NOT dplyr and tidyverse

@lucashertzog

I make this strong recommendation.

Readings: https://r4ds.had.co.nz/introduction.html?q=data.table#big-data

dplyr and the tidyverse focuses on small, in-memory datasets. This is the right place to start because you can’t tackle big data unless you have experience with small data. The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care you can typically use them to work with 1-2 Gb of data. If you’re routinely working with larger data (10-100 Gb, say), you should learn more about data.table. This book doesn’t teach data.table because it has a very concise interface which makes it harder to learn since it offers fewer linguistic cues. But if you’re working with large data, the performance payoff is worth the extra effort required to learn it.

https://github.com/matloff/TidyverseSkeptic

Everything in the Skeptic is spot on. Agree agree agree

cardat / air-health-sws-r-targets-technique-tweaking-tinkering

Recommendation: use data.table NOT dplyr and tidyverse #2