Reading-eScience-Centre / pycovjson

Create CovJSON files from common scientific data formats
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

'DataArray' object has no attribute 'standard_name' #20

Closed lewismc closed 7 years ago

lewismc commented 7 years ago

I am running into the following issue when attempting to convert a particular data product (link below available in both netCDF3 and netCDF4) to covjson. http://podaac-opendap.jpl.nasa.gov/opendap/allData/aquarius/L3/mapped/V4/7day_running/SCI/2015/158/Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg.bz2.html If I use pycovjson to read the file all is well

lmcgibbn@LMC-056430 /usr/local/pycovjson(master) $ pycovjson-viewer -v ~/Desktop/Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg.bz2.nc4
<xarray.Dataset>
Dimensions:   (dim1: 3, dim2: 256, lat: 180, lon: 360)
Coordinates:
  * lat       (lat) float32 89.5 88.5 87.5 86.5 85.5 84.5 83.5 82.5 81.5 ...
  * lon       (lon) float32 -179.5 -178.5 -177.5 -176.5 -175.5 -174.5 -173.5 ...
  * dim1      (dim1) int64 0 1 2
  * dim2      (dim2) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
    l3m_data  (lat, lon) float64 nan nan nan nan nan nan nan nan nan nan nan ...
    palette   (dim1, dim2) int8 71 0 71 73 0 73 76 0 75 78 0 77 80 0 79 82 1 ...
Attributes:
    HDF5_GLOBAL.Product_Name: Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg
    HDF5_GLOBAL.Sensor_Name: Aquarius
    HDF5_GLOBAL.Sensor: Aquarius
    HDF5_GLOBAL.Title: Aquarius Level-3 Standard Mapped Image
    HDF5_GLOBAL.Data_Center: NASA/GSFC OBPG
    HDF5_GLOBAL.Mission: SAC-D Aquarius
    HDF5_GLOBAL.Mission_Characteristics: Nominal orbit: inclination=98.0 (Sun-synchronous); node=6PM (ascending); eccentricity=<0.002; altitude=657 km; ground speed=6.825 km/sec
    HDF5_GLOBAL.Sensor_Characteristics: Number of beams=3; channels per receiver=4; frequency 1.413 GHz; bits per sample=16; instatntaneous field of view=6.5 degrees; science data block period=1.44 sec
    HDF5_GLOBAL.Product_Type: R7
    HDF5_GLOBAL.Processing_Version: V4.0
    HDF5_GLOBAL.Software_Name: smigen
    HDF5_GLOBAL.Software_Version: 5.04
    HDF5_GLOBAL.Processing_Time: 2015184121146000
    HDF5_GLOBAL.Input_Files: Q20151582015164.L3b_R7_SCI_V4.0.main
    HDF5_GLOBAL.Processing_Control: smigen par=Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg.param
    HDF5_GLOBAL.Input_Parameters: ifile = Q20151582015164.L3b_R7_SCI_V4.0.main|ofile = Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg|prod = SSS|palfile = /sdps/sdpsoper/Science/OCSSW/V2015.2/data/common/palette/sss.pal|processing version = V4.0|meas = 1|stype = 3|datamin = 0.000000|datamax = 70.000000|lonwest = -180.000000|loneast = 180.000000|latsouth = -90.000000|latnorth = 90.000000|resolution = 1deg|projection = RECT|gap_fill = 0|seam_lon = -180.000000|minobs = 0|deflate = 4|oformat = HDF5|precision = F|
    HDF5_GLOBAL.L2_Flag_Names: POINTING,NAV,LANDRED,ICERED,REFL_1STOKESMOONRED,REFL_1STOKESGAL,TFTADIFFRED,RFI_REGION,SAOVERFLOW,COLDWATERRED,WINDRED,TBCONS
    HDF5_GLOBAL.Period_Start_Year: 2015
    HDF5_GLOBAL.Period_Start_Day: 158
    HDF5_GLOBAL.Period_End_Year: 2015
    HDF5_GLOBAL.Period_End_Day: 158
    HDF5_GLOBAL.Start_Time: 2015158013204597
    HDF5_GLOBAL.End_Time: 2015158124519488
    HDF5_GLOBAL.Start_Year: 2015
    HDF5_GLOBAL.Start_Day: 158
    HDF5_GLOBAL.Start_Millisec: 5524596
    HDF5_GLOBAL.End_Year: 2015
    HDF5_GLOBAL.End_Day: 158
    HDF5_GLOBAL.End_Millisec: 45919487
    HDF5_GLOBAL.Start_Orbit: 21446
    HDF5_GLOBAL.End_Orbit: 21452
    HDF5_GLOBAL.Map_Projection: Equidistant Cylindrical
    HDF5_GLOBAL.Latitude_Units: degrees North
    HDF5_GLOBAL.Longitude_Units: degrees East
    HDF5_GLOBAL.Northernmost_Latitude: 90.0
    HDF5_GLOBAL.Southernmost_Latitude: -90.0
    HDF5_GLOBAL.Westernmost_Longitude: -180.0
    HDF5_GLOBAL.Easternmost_Longitude: 180.0
    HDF5_GLOBAL.Latitude_Step: 1.0
    HDF5_GLOBAL.Longitude_Step: 1.0
    HDF5_GLOBAL.SW_Point_Latitude: -89.5
    HDF5_GLOBAL.SW_Point_Longitude: -179.5
    HDF5_GLOBAL.Data_Bins: 5155
    HDF5_GLOBAL.Number_of_Lines: 180
    HDF5_GLOBAL.Number_of_Columns: 360
    HDF5_GLOBAL.Parameter: Sea Surface Salinity
    HDF5_GLOBAL.Measure: Mean
    HDF5_GLOBAL.Units: psu
    HDF5_GLOBAL.Scaling: linear
    HDF5_GLOBAL.Scaling_Equation: (Slope*l3m_data) + Intercept = Parameter value
    HDF5_GLOBAL.Slope: 1.0
    HDF5_GLOBAL.Intercept: 0.0
    HDF5_GLOBAL.Data_Minimum: 25.3962
    HDF5_GLOBAL.Data_Maximum: 38.6452
    HDF5_GLOBAL.Suggested_Image_Scaling_Minimum: 0.0
    HDF5_GLOBAL.Suggested_Image_Scaling_Maximum: 70.0
    HDF5_GLOBAL.Suggested_Image_Scaling_Type: ATAN
    HDF5_GLOBAL.Suggested_Image_Scaling_Applied: No
    HDF5_GLOBAL._lastModified: 2015184121146000

When I attempt to covert the product to covjson I get the following

lmcgibbn@LMC-056430 /usr/local/pycovjson(master) $ pycovjson-convert -i ~/Desktop/Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg.bz2.nc4 -o ~/Desktop/Q20151582015164.L3m_R7_SCI_V4.0_SSS_1deg.bz2.nc4.covjson
Traceback (most recent call last):
  File "/Users/lmcgibbn/miniconda3/bin/pycovjson-convert", line 9, in <module>
    load_entry_point('pycovjson==0.3.8', 'console_scripts', 'pycovjson-convert')()
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/pycovjson-0.3.8-py3.5.egg/pycovjson/cli/convert.py", line 66, in main
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/pycovjson-0.3.8-py3.5.egg/pycovjson/write.py", line 33, in __init__
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/pycovjson-0.3.8-py3.5.egg/pycovjson/read_netcdf.py", line 347, in get_axes
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/core/common.py", line 194, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataArray' object has no attribute 'standard_name'

I've not tried to debug this yet however it seems to be a bug. I've seen error messages similar to this before and I am going to start logging each occurrence here from now on so we can even write tests to try and verify if bugs are present.

lewismc commented 7 years ago

The issue is related to https://github.com/Reading-eScience-Centre/pycovjson/blob/master/pycovjson/read_netcdf.py#L347 I am not sure exactly what the standard_name is however I'm debugging now and will find out.

lewismc commented 7 years ago

There are at least two bugs in the following function

    def get_axes(self):

        axes_dict = OrderedDict()
        x_list = ['lon', 'longitude', 'LONGITUDE', 'Longitude', 'x', 'X']
        y_list = ['lat', 'latitude', 'LATITUDE', 'Latitude', 'y', 'Y']
        t_list = ['time', 'TIME', 't', 'T']
        z_list = ['depth', 'DEPTH']
        for coord in self.dataset.coords:
            try:
                if self.dataset[coord].axis == 'T':
                    axes_dict['t'] = coord
                if self.dataset[coord].axis == 'Z':
                    axes_dict['z'] = coord
            except:
                pass
            try:
                if self.dataset[coord].units == 'degrees_north':
                    axes_dict['y'] = coord
                if self.dataset[coord].units == 'degrees_east':
                    axes_dict['x'] = coord
            except:
                pass
            try:
                if self.dataset[coord].positive in ['up', 'down']:
                    axes_dict['z'] = coord
            except:
                pass

            if coord in t_list or self.dataset[coord].name in t_list:
                axes_dict['t'] = coord
            if coord in z_list or self.dataset[coord].name in z_list:
                axes_dict['z'] = coord

        return axes_dict

If for example I execute the following

>>> self.dataset[coord].axis
Traceback (most recent call last):
  File "/Users/lmcgibbn/.p2/pool/plugins/org.python.pydev_5.2.0.201608171824/pysrc/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<console>", line 1, in <module>
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/core/common.py", line 194, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataArray' object has no attribute 'axis'

If I execute the following

>>> self.dataset[coord].positive
Traceback (most recent call last):
  File "/Users/lmcgibbn/.p2/pool/plugins/org.python.pydev_5.2.0.201608171824/pysrc/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<console>", line 1, in <module>
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/core/common.py", line 194, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataArray' object has no attribute 'positive'
jonblower commented 7 years ago

I think all of these are symptoms of the fact that the code does not yet fully handle all possibilities that NetCDF files can offer. It makes some simplifying assumptions (e.g. it assumes all coordinate systems are lat-lon-time-depth), which is of course not always the case. It also seems to assume that the standard_name is always present, which is also not always true. (The standard_name is a term from the Climate and Forecast vocabulary - it is recommended to use a standard name if an appropriate one exists, but sometimes it doesn't. So we need to handle the case where the standard name is not present.

It would be quite a bit of work to fully support all of the CF conventions, so we may have to take a pragmatic approach and fix issues as they arise. (It might be a good idea to look at other Python libraries like Iris or cf-python, which may help us to support CF more fully.)

RileyWilliams commented 7 years ago

Have pushed a new version of read_netcdf which I think will adress some of these issues. @jonblower I agree. I have tried to support as many cases as possible, for example any netcdf file with a 'positive' attribute, is assumed to be the Z axis. Which I believe is part of the CF conventions for files which contain a 'z' axis.

lewismc commented 7 years ago

@jonblower @RileyWilliams yes I agree. I wonder if we were just to use the netCDF Python API instead of xarray... would this solve the issue. AFAICT it is xarray which does not follow CF conventions. Right now read_netcdf does not use any netCDF functions.

jonblower commented 7 years ago

I think netcdf-python supports some of CF (e.g. missing values) but only a small part of it. Not sure what xarray does. Usually you either have to code these in yourself or use a higher-level library. @RileyWilliams I could put you in touch with the cf-python developer if you like, although did we find out that this is not cross-platform? Also @RileyWilliams we could talk through your get_axes() function as there may be some ways we could improve it and squash a couple of bugs.

RileyWilliams commented 7 years ago

@lewismc @jonblower xarray uses the netcdf4 library in the backend, it should be able to read every file that the netcdf4 library can. Xarray stores netcdf files in a DataArray, the issues occur when mapping variables to their respective axis.

lewismc commented 7 years ago

ACK OK, lets have a look at read_netcdf then. Riley, you said you had pushed some sort of update... is that correct?

RileyWilliams commented 7 years ago

Yes I just pushed a quick fix for the issue you described, however I think that it does need some more work to make it more robust.

RileyWilliams commented 7 years ago

Additionally, I have also experienced an isolated issue with a netcdf file not being read by xarray correctly, but the netcdf library can read it fine. So there may be more going on behind the scenes. I have been going over the xarray code to work out how they go about decoding netcdf.