NNPDF / reportengine

A framework for declarative data analysis
https://data.nnpdf.science/validphys-docs/guide.html
GNU General Public License v2.0
1 stars 2 forks source link

Saving tables as parquet #40

Open siranipour opened 3 years ago

siranipour commented 3 years ago

Requires pandas 1.2.0 to save multi index df's

also needs the columns to be changed to str but that should be fine. Other than that, it works really nicely

Also requires pyarrow dependency. Which pandas looks for before looking for fastparquet

p.s it'd be nice to do editable installs, feels like im coding with c++ using this flint thing

Zaharid commented 3 years ago

Cool. Would be good to have command line options controlling this.

also needs the columns to be changed to str but that should be fine. Other than that, it works really nicely

We may want to make sure that various vp actions only trade in strings, e.g. for the replica numbers.

p.s it'd be nice to do editable installs, feels like im coding with c++ using this flint thing

flit install --symlink

should work.

Zaharid commented 3 years ago

I think I rather have this fail on non string column names than it being converted silently. Ultimately we would like this to be able to round trip. Ideally for any dataframe but we will have to live with the constrains here.

Zaharid commented 3 years ago

Looks good from a quick look.

siranipour commented 3 years ago

Think we can merge this boss?

siranipour commented 3 years ago

I'm interested to see if the VP test will pass if we do

siranipour commented 3 years ago

bump

wilsonmr commented 3 years ago

Just to understand correctly. If in the environment I fix it to save CSVs still then (once the optional dependencies is a thing) this will just be the old behaviour?

EDIT: I like the look of this but for another project which uses reportengine I want to have a get out of jail free card if this causes some issues.

siranipour commented 3 years ago

Yes in principle if you do app runcard.yaml --table-formats csv, it should just be doing what it used to. And once the optional install is in place it will be installing the same dependencies too (except for pandas >= 1.2.0)