asdf-format / asdf

ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data
http://asdf.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
532 stars 58 forks source link

explicit support for tabular data? #919

Open allefeld opened 3 years ago

allefeld commented 3 years ago

In another issue I made a side comment regarding pandas.DataFrame, but now I think it deserves a separate issue.

Btw., since you support ndarrays out of the box, why not also pandas.DataFrames?

Originally posted by @allefeld in https://github.com/asdf-format/asdf/issues/918#issuecomment-769276112

The reply was:

As for Panda DataFrames, we are trying to keep the format language neutral and putting something like that in the main standard works against that. Not that it can't be added as a well supported extension though.

Originally posted by @perrygreenfield in https://github.com/asdf-format/asdf/issues/918#issuecomment-769306761

I would say that a language-neutral scientific data format does need to support this. Not specifically for pandas.DataFrame objects, but for tabular data, which is a structure that occurs in many areas of science.

This would correspond to Python's pandas.DataFrame, R's data.frame, Matlab's table, etc. With this feature, an ASDF file could hold the data that is otherwise kept in CSV/TSV files, Excel sheets, and I'm guessing Apache Parquet and many others.

eslavich commented 3 years ago

We do have a tag and schema in the core standard that supports tabular data:

https://github.com/asdf-format/asdf-standard/blob/master/schemas/stsci.edu/asdf/core/table-1.0.0.yaml

Currently the only implementation is for astropy.table.Table.

What's missing is an extension that knows how to translate between that schema and pandas.Dataframe. I suppose we'll also need a feature in this library that allows users to select which table implementation to create when reading a file -- I don't think we've had this kind of collision before, and I'm not sure what will happen when both astropy and pandas are installed.