Ouranosinc / pavics-vdb

Store virtual netCDF file aggregations and metadata fixes
0 stars 0 forks source link

Implement open_ncml function in xarray #6

Closed huard closed 1 year ago

huard commented 4 years ago

Just a basic proof-of-concept.

ds = xarray.open_ncml(ncml, path)

where ncml is a path to a file or a file-like object with a read method. Would support aggregation using xarray.open_mfdataset, variable renaming and attribute modifications.

tlogan2000 commented 4 years ago

Started working on this this afternoon. https://github.com/Ouranosinc/pavics-vdb/blob/xarray_openNCML/test_NcMLs/xarray_openncml.py

tlogan2000 commented 4 years ago

We should try to decide which use cases we would like to support : see https://www.unidata.ucar.edu/software/netcdf-java/current/ncml/Cookbook.html

Initial version could give NotImplemented error for functionality we deem unimportant I suppose

tlogan2000 commented 4 years ago

@huard latest push is a little more flexible. Aggregations can contain the 'scan' parameter or simply multiple 'location' lines. Relative path ncml reads seem to work when running the xarray_openncml.py example

huard commented 4 years ago

One way to be systematic about this would be to rewrite the tests from netcdf-java in /cdm/core/src/test/java/ucar/nc2/ncml/ using the test files from /cdm/src/test/data/ncml.

I suggest renaming the function to open_ncml to be consistent with xr.open_dataset.

Another suggestion would be to write a subclass of xncml.Dataset

class NcMLDataset(xncml.Dataset):
    def __init__(self, ncml, tdsroot=None):
        super().__init__(ncml)

    def open(self, **kwargs):
        files = self.locate_files(tdsroot)
        atts = self.new_attributes()

        ds = open_mfdataset(files, **kwargs)
        # Change attributes
        return ds
huard commented 4 years ago

Also might be worth looking into https://eulxml.readthedocs.io/en/latest/xmlmap.html

Could allow to define the logic to convert individual elements, and then let the package take care of parsing the xml.

tlogan2000 commented 4 years ago

@huard thanks for the suggestions ... this branch is definitely more of an exploration than anything final. I like the idea of a subclass, however xncml seems pretty sparse (I think the author was doing som exploring similar to our own) so am unsure whether it is the goto solution really. So far I was really just using it to parse the ncml into something python-- like so xmlmap might be interesting as well

huard commented 4 years ago

Yep. Your experiments shows that it's definitely possible to support at least part of the NcML standard. The question for us is how much effort we want to put into this. I see this as mid-term R&D, that we probably don't want to do alone. So can we bring this up to a point were people are actually excited about the possibilities this offers and contribute ?

tlogan2000 commented 4 years ago

Yep i'm ok leaving it for now (maybe a bit of clean-up and implementing some metadata changes) and then seeing if someone takes up the torch. For the talk at esgf I thought I could simply do theoretical mapping of xr functions to some ncml keys even if we don't implement them in the demo

i.e. joinexisting == xr.openmf_dataset joinnew == xr.concat(dim=name) etc

huard commented 4 years ago

ok. Other option for implementation would be lxml.objectify. http://www.davekuhlman.org/Objectify_files/obj_hdf_xml.py

huard commented 1 year ago

https://github.com/xarray-contrib/xncml/pull/6

huard commented 1 year ago

Done.