cedadev / nappy

NASA Ames Processing in PYthon (NAPPy) - a Python library for reading, writing and converting NASA Ames files.
BSD 3-Clause "New" or "Revised" License
9 stars 13 forks source link

API extension: na 2 xarray.Dataarray or xarray.Dataset ? #50

Open FObersteiner opened 2 years ago

FObersteiner commented 2 years ago

This is more of a suggestion for a new feature than an "issue".

Background: we still have a lot of data in NASA Ames format. Currently, there's an initiative at our institute to develop a collection of tools that are basically method extensions for xarray.Dataarray and xarray.Dataset. github: imktk. So I was looking for convenient ways to load the na data to xarray. And since I noted that nappy uses xarray internally for the conversion to netCDF, I thought that could be a possibility.

A way to do this with the existing version of nappy could be e.g.

from pathlib import Path
import xarray as xr

import nappy
import nappy.nc_interface.na_to_xarray as na2xr

f = Path('./nappy/example_files/1001a.na') # from the samples collection
xr_converter_class = na2xr.NADictToXarrayObjects(nappy.openNAFile(f))

xr_tuple = xr_converter_class.convert()
arrays = xr_tuple[0] # list of data arrays

new_attrs = {} # we need to combine attributes manually
for a in arrays:
    for k, v in a.attrs.items():
        new_attrs[a.name + '_' + k] = v # not guaranteed to work with ANY input!

xrds = xr.merge(arrays, combine_attrs="drop")
xrds.attrs = new_attrs

print(xrds)

<xarray.Dataset>
Dimensions:              (pressure: 28)
Coordinates:
  * pressure             (pressure) float64 1.013e+03 540.5 ... 4e-05 2.5e-05
Data variables:
    total_concentration  (pressure) float64 2.55e+19 1.53e+19 ... 5.03e+11
    temperature          (pressure) float64 288.0 256.0 223.0 ... 300.0 360.0
Attributes:
    total_concentration_units:                 cm-3
    total_concentration_long_name:             total_concentration
    total_concentration_title:                 total_concentration
    total_concentration_nasa_ames_var_number:  0
    temperature_units:                         degrees K
    temperature_long_name:                     temperature
    temperature_title:                         temperature
    temperature_nasa_ames_var_number:          1

While that works for me, it's not explicitly part of the nappy API - would it be a useful extension?

agstephens commented 2 years ago

@FObersteiner, just checking that I understand what is happening in your example.

Is the main requirement to collect the metadata associated with the xr.DataArrays and assign them to the xr.Dataset?

FObersteiner commented 2 years ago

@agstephens honestly, I haven't had the time to work on this further ;-)

My question was if it would be good to have the export from na to xarray exposed more directly on the API, to avoid having to go through nc_interface.na_to_xarray. But if that point never came up in the past, I guess it's not that important and we might as well leave it as it is, I can live with that.

Regarding metadata, my guess would be that the handling of those is pretty user-specific (see var_and_units_pattern...), so I wouldn't touch that.

agstephens commented 2 years ago

@FObersteiner, I agree that it would be useful to bring this up to the API level. It's a sensible proposal.