anyone-can-cook / rclass1

EDUC 260A: Introduction to Programming and Data Management Using R
https://anyone-can-cook.github.io/rclass1/
5 stars 5 forks source link

Best Practices for Importing and Managing Large Datasets in R #108

Open imDiegoCasillas opened 1 month ago

imDiegoCasillas commented 1 month ago

Hi everyone, I’ve recently started working with large datasets in R, and I’ve noticed that importing them using read_csv() or read.table() can be quite slow. Also, memory usage seems to spike when I’m working with these datasets.

I wanted to ask for advice or best practices on how to:

1.  Efficiently import large datasets without hitting memory limits.
2.  Handle data more effectively to avoid R crashing or slowing down.
3.  Any recommended packages or techniques for optimizing performance?

So far, I’ve looked into data.table::fread() as a faster alternative to read_csv(). Are there any other tools or strategies that you recommend?