flrsh-dev / flrsh-lessons

{flrsh} lessons
10 stars 0 forks source link

duckdb deep dive #3

Open JosiahParry opened 4 months ago

JosiahParry commented 4 months ago

In one of the closing chapters we need to discuss

WHEN should you use duckdb over dtplyr or dplyr for example.

When to use duckdb:

durraniu commented 4 months ago

Thanks for creating this cool project. Will this also include duckplyr?

JosiahParry commented 4 months ago

Thanks @durraniu! It isn't finished I'm working on it in my free time.

It will not focus on duckplyr but will get an honorable mention in two places (from zero to prod with dplyr and here)!

From my exploration and understanding, duckplyr is limited to in-memory data frames. My focus with flrsh.dev (flourish dot dev) is to show how to use R for truly scalable code. It is my (perhaps unfounded) belief that duckdb + dbplyr is a more robust and scalable toolkit than duckplyr because of its ability to use a database.duckdb path which permits out-of-core processing. I've also found that duckplyr doesn't support stringr and lubridate like dbplyr does!

durraniu commented 4 months ago

Good to know about the limitations of duckplyr.

I have not used duckdb yet, but utilized arrow for partitioning and importing vehicle trajectory data (11 millions rows). Thinking to expand on it by using duckdb & dbplyr for analysis. Would that be of interest to you in this project?

JosiahParry commented 4 months ago

This dataset is amazing! Thank you so much for sharing. i've added this example to the datasets issue https://github.com/flrsh-dev/flrsh-lessons/issues/4 For this specific course on DuckDB I think the bikes will be utilized there. But this could absolutely be useful elsehwere!