Benchmark comment "from a CSV"

phgrosjean commented 1 year ago

Hello,

On your CSV benchmark, you use read.csv() (base function, very slow) for all three versions base/dplyr/data.table, while comparing to the Polars-specific CSV read function. Most of the time is spent in this function. So, the same timings for all three that is not representative of each implementation because tidyverse would use readr::read_csv() and data.table would use fread() instead.

Also, in your comparison between eager and lazy polars, you forgot to collect() the lazy version. It should be:

microbenchmark(
  "eager mode" = csv_eager_polars(),
  "lazy mode" = csv_lazy_polars()$collect(),
  times = 5
 )

Otherwise, excellent work !

PhG

ddotta commented 1 year ago

Hi @phgrosjean,
You're right, I'll correct that and post the new benchmark results in this issue.
Thanks for the feedback!

ddotta commented 1 year ago

I've made the corrections and while we wait for the book to become available again, here are the results on my computer:

ddotta / cookbook-rpolars

Benchmark comment "from a CSV" #7