Closed suvayu closed 2 months ago
Nice! I have also used DuckDB in combination with dplyr
in R, so I might add something about using data bases in R to the R language guide.
Hi @maltelueken that would be amazing! This also addresses the last point in the DuckDB part about combining with other tools. We were also lacking R experience, so couldn't comment on R libraries.
@Morrizzzzz Would you be interested and have time to review this?
The chapter could be more about data engineering i.e. how to use these tools or best practices for ETL pipelines.
@recap do you have some resources to link to on data engineering and/or ETL pipelines? Sounds like a nice addition (for a new PR). We should try to restrict it to techniques/concepts we actually (can) use in projects. I think you have done some of that, no?
Also, @recap your suggested additions sound good, but did you also review what was already in the PR and whether it makes sense? Then we can merge this PR as it is now and do your additions in a next PR (or quickly add them to this PR if you want, I think @suvayu is on holiday anyway).
Thank you so much @suvayu & @f-hafner for taking this initiative and @recap for the great review and additions.
... One final thing before merging is to add it to the sidebar menu, though :) I'll do that right now...
Remove trivial database section from Python guide
Add a chapter on datasets
[x] I followed the CONTRIBUTING guidelines.
Below, describe what this Pull Request adds:
This PR removes the database section from the Python guide (as discussed in #316), and introduces a new chapter on handling datasets. It discusses using local databases, and other data processing libraries, and respective trade-offs.