ddotta / cookbook-rpolars

Cookbook to provide solutions to common tasks and problems in using Polars with R
https://ddotta.github.io/cookbook-rpolars/
Creative Commons Attribution 4.0 International
56 stars 13 forks source link

Benchmark comment "from a CSV" #7

Closed phgrosjean closed 1 year ago

phgrosjean commented 1 year ago

Hello,

On your CSV benchmark, you use read.csv() (base function, very slow) for all three versions base/dplyr/data.table, while comparing to the Polars-specific CSV read function. Most of the time is spent in this function. So, the same timings for all three that is not representative of each implementation because tidyverse would use readr::read_csv() and data.table would use fread() instead.

Also, in your comparison between eager and lazy polars, you forgot to collect() the lazy version. It should be:

microbenchmark(
  "eager mode" = csv_eager_polars(),
  "lazy mode" = csv_lazy_polars()$collect(),
  times = 5
 )

Otherwise, excellent work !

PhG

ddotta commented 1 year ago

Hi @phgrosjean,
You're right, I'll correct that and post the new benchmark results in this issue.
Thanks for the feedback!

ddotta commented 1 year ago

I've made the corrections and while we wait for the book to become available again, here are the results on my computer:

image

image