IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 486 forks source link

[Feature Request] HDF5 / NetCDF support #7947

Open CaptainSifff opened 3 years ago

CaptainSifff commented 3 years ago

Since Dataverse already has support for the domain specific format *.fits. Would it be possible to add support for HDF5 (https://www.hdfgroup.org/solutions/hdf5/). The file format is very flexible so a first step would be to just support the tabular-like usage as it is done in NetCDF4(https://en.wikipedia.org/wiki/NetCDF)

Thinking a bit forward, I would like to be able to browse the content of the container like I can browse the files that make up a dataset.

CaptainSifff commented 3 years ago

As a side-note there are also other standardized formats that include metadata that are built on top of HDF5: https://www.nexusformat.org/

pdurbin commented 2 years ago

@CaptainSifff thanks for creating this issue.

a first step would be to just support the tabular-like usage as it is done in NetCDF4

@atrisovic just pointed me toward an example of a .nc4 file at https://github.com/energy-policy-institute-uchicago/xarray-notebooks/blob/master/xarray-basics.ipynb

It has a nice diagram of the file format:

netcdf

Related:

pdurbin commented 1 year ago

@CaptainSifff I was just talking to @atrisovic about your ideas and we'd like to interview you! 😄

When you have a minute can you please pop in https://chat.dataverse.org so we can schedule a time? Thanks!

pdurbin commented 1 year ago

@CaptainSifff thanks for meeting with me and @atrisovic a while back! We recently published a NetCDF/HDF5 design doc and we'd love your feedback!

https://docs.google.com/document/d/1Ax_sMdgx5ROkIBA7-IC4_hySvgXkk6O8qTZLIvWWnqE/edit?usp=sharing

We'd also love feedback from others reading this. Thanks!

mreekie commented 1 year ago

grooming

CaptainSifff commented 1 year ago

@pdurbin I looked over the google docs and You are making great progress there, and I think that the geosciences will appreciate this effort. I currently think that it's a bit netCDF centric, but that's OK for a start. The next stop would likely be sth. like the nexus format from above that utilizes HDF5.

pdurbin commented 1 year ago

@CaptainSifff thanks for the feedback.

With regard to HDF5, I'm not sure how closely you've been following @JR-1991 's amazing work on incorporating H5Web as an external tool with Dataverse:

I just tried it with a random Nexus file I found at https://github.com/nexusformat/exampledata/blob/eae516807ef7e27d1c45aab3af3a64a679154677/IPNS/LRMECS/hdf5/lrcs3701.nx5

Here's how it looks:

Screen Shot 2023-05-16 at 10 28 43 AM

At the moment anyway, you can play around with it here: https://dev1.dataverse.org/file.xhtml?fileId=1069&version=1.0

We've been talking about H5Web here: https://dataverse.zulipchat.com/#narrow/stream/376593-geospatial/topic/plot.20arrays.20from.20HDF5

I hope H5Web helps a bit with what you were saying at the start: "I would like to be able to browse the content of the container like I can browse the files that make up a dataset."

Dataverse treats that Nexus file as a HDF5 file, which means the NcML preview is shown as well

Screen Shot 2023-05-16 at 10 28 55 AM

While I'm writing, @CaptainSifff how do you feel about closing this issue now there there is at least a little HDF5 and NetCDF support in Dataverse? You can preview our docs for our upcoming 5.14 release at https://preview.guides.gdcc.io/en/develop/user/dataset-management.html#netcdf-and-hdf5 and here's a screenshot:

Screen Shot 2023-05-16 at 10 37 56 AM

My thinking is that you (and others) can create smaller issues about adding this or that feature (maybe Nexus support). This works best for us because we can better estimate small issues to be included in a sprint.