gaia-unlimited / gaiaunlimited

Code for GaiaUnlimited Gaia selection function tools
https://gaiaunlimited.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
22 stars 5 forks source link

Add flexibility to save downloaded queries in different file formats #29

Open shouryakhanna opened 1 year ago

shouryakhanna commented 1 year ago

Add functionality to save downloaded queries in different file formats to avoid heavy csv files. How to handle additional header information will have to be sorted out.

emilyhunt commented 1 year ago

I'd really appreciate it if support for the parquet format could be added, since it's a much quicker format for large dataframes when saving and loading tables.

Maybe the information in the header could be stored in an accompanying metadata file or in the filename?

HDF files would allow for information to be stored in headers and would allow for smaller file sizes, but wouldn't be nearly as fast as using parquet. (see second link above)

mfouesneau commented 1 month ago

https://github.com/gaia-unlimited/gaiaunlimited/blob/a6bbe61ab07ddea616aa49d5b632a981fc117b78/src/gaiaunlimited/selectionfunctions/subsample.py#L191

You can add a keyword argument e.g. format='csv' and then use it in all the filenames transparently. The only place that needs more is for pandas calls. You will have to do a getatrr(df, r'to_{format}') to get the writing function (and similarly for reading)