Open ld-cd opened 3 months ago
I think getting rid of pytables would be a ton of work, this should be a low priority unless there is a real show stopper.
Options for OLAP databases:
DuckDB publishes a perf comparison (so keep the bias in mind): https://duckdblabs.github.io/db-benchmark/
50G group by is likely the most representative of the hot path in our workload currently.
For a new format binney
will serialize into parqut files https://mazinlab.github.io/binney/binney.html#BinDirectory
and has optional polars support https://mazinlab.github.io/binney/binney.html#BinDirectoryDF
pytables
is a pretty consistent packaging issue and does not package all its depends (namely thehdf5
library) it should likely be replaced withh5py
for maintaining file compatibility and if in-kernel queries are really needed we should switch to a more modern data format going forward likeparquet
and use eitherpandas
orpolars
for queries.The other alternative is vendoring
pytables
and committing to maintaining python compatibility and functional packaging going forward but this is not something I have time to do