CDCgov / Rt-without-renewal

https://cdcgov.github.io/Rt-without-renewal/
Apache License 2.0
12 stars 2 forks source link

Computational output: moving from save-to-disk to `DuckDB` #225

Open SamuelBrand1 opened 1 month ago

SamuelBrand1 commented 1 month ago

At the moment, the analysis workflow relies on saving to disk for both serialising results and checkpointing (via DrWatson.produce_or_load). Similarly to discussion #221 this is fine for low/moderate computational workloads but isn't obviously scalable and relies on local file structure.

A (IMO) better alternative is open a connection to a DuckDB instance using the Julia front-end to stream results at; this also makes it easier to run the post-processing as results arrive.

seabbs commented 1 month ago

Does DuckDB not have an interface to DataFrames.jl or similar to avoid needing to use the SQL syntax?

SamuelBrand1 commented 1 month ago

Does DuckDB not have an interface to DataFrames.jl or similar to avoid needing to use the SQL syntax?

It looks like the DataFrames interface is via an API to Appender and you can add row by row; not sure about load back from the DB.