NCAR / VAPOR

VAPOR is the Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Researchers
https://www.vapor.ucar.edu/
BSD 3-Clause "New" or "Revised" License
178 stars 49 forks source link

Provide utility program to gather metadata of a big dataset #2592

Open shaomeng opened 3 years ago

shaomeng commented 3 years ago

Proposal: Introduce a utility program that goes over a big data set (potentially many time steps and multiple petabytes) and save a variety of metadata/statistics to an individual file.

Intended use case: Invoke this utility program once in batch mode and save the useful metadata. Then when using VAPOR in interactive mode, the saved metadata could be imported and used for speeding up multiple operations and/or more informative GUI displays.

Metadata/statistics to calculate

(Note: the metadata file format should be designed so that a new metadata field could be easily added without breaking backward compatibility.)

(Note 2: the generated metadata file should not be tied together with wavelet transformed formats; rather, it will point to any existing file formats that VAPOR supports, including many unstructured files.)

clyne commented 3 years ago

One approach here might be to incorporate the above information into a VDC master file, and allow a VDC master file to referenced non VDC data. I.e. currently a VDC master file provides an indirect reference to VDC data files that contain the actual data. The VDC master file could easily be extended to reference other types of file such as NetCDF CF files.