csgillespie / efficientR

Efficient R programming: a book
https://csgillespie.github.io/efficientR/
Other
720 stars 373 forks source link

Add code for #292 #293

Open Robinlovelace opened 3 years ago

csgillespie commented 3 years ago

@Robinlovelace Nice additional, however, the code doesn't run for me. Also, what do you suggest about timing from vroom and lasy loading?

Robinlovelace commented 3 years ago

Was just a starter for 10. What do you mean by 'timing for lazy loading'? Happy to iterate, just trying to get things up-to-date. Another idea: should we print package versions? I think the current benchmark results are pretty out-of-date...

csgillespie commented 3 years ago

The main reason vroom can be faster is because character data is read from the file lazily; you only pay for the data you use. This lazy access is done automatically, so no changes to your R data-manipulation code are needed.

Source: https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/

Cheers

Robinlovelace commented 3 years ago

Makes sense. We could add that caveat to the text - that's a good starting point, especially as some char string variables are not used. Re the implementation, that's amazing. Does it mean that an object created by vroom knows the file that generated it and will convert the text to character representation only if that column is used?

Robinlovelace commented 3 years ago

Worth documenting that and adding a link to the book I think, a very interested and fast implementation.

engineerchange commented 3 years ago

I'm hoping to take a look at this a bit later in the week. I do agree to printing package versions though; I think it's a quick way to know if the benchmark is wildly out of date.

Robinlovelace commented 3 years ago

Great, thanks @engineerchange. Any additions on top of this branch very welcome.