Open JosiahParry opened 4 months ago
Thanks for creating this cool project. Will this also include duckplyr?
Thanks @durraniu! It isn't finished I'm working on it in my free time.
It will not focus on duckplyr but will get an honorable mention in two places (from zero to prod with dplyr and here)!
From my exploration and understanding, duckplyr is limited to in-memory data frames. My focus with flrsh.dev
(flourish dot dev) is to show how to use R for truly scalable code. It is my (perhaps unfounded) belief that duckdb + dbplyr is a more robust and scalable toolkit than duckplyr because of its ability to use a database.duckdb
path which permits out-of-core processing. I've also found that duckplyr doesn't support stringr and lubridate like dbplyr does!
Good to know about the limitations of duckplyr.
I have not used duckdb yet, but utilized arrow for partitioning and importing vehicle trajectory data (11 millions rows). Thinking to expand on it by using duckdb & dbplyr for analysis. Would that be of interest to you in this project?
This dataset is amazing! Thank you so much for sharing. i've added this example to the datasets issue https://github.com/flrsh-dev/flrsh-lessons/issues/4 For this specific course on DuckDB I think the bikes will be utilized there. But this could absolutely be useful elsehwere!
In one of the closing chapters we need to discuss
WHEN should you use duckdb over dtplyr or dplyr for example.
When to use duckdb: