HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
114 stars 38 forks source link

xarray can't identify time units in HSDS dataset #45

Open rsignell-usgs opened 6 years ago

rsignell-usgs commented 6 years ago

In this notebook https://gist.github.com/rsignell-usgs/07143a5ab54afb8ad6eb1af255d025c9 we use xarray to open a local netcdf4 file and then the same dataset that was 'hsload'ed to hsds.

xarray automatically recognized the CF-compliant time units and converts the time coordinate to datetime so that the plot is correctly labeled in cell [6].

But time is not recognized for the the HSDS dataset plot in cell [5].

Any idea what the problem is?

2017-12-22_18-26-38

ghost commented 6 years ago

Would it be possible to print out what xarray thinks of that variable from the two sources? Have two cells with ds2['TMP_2maboveground'] and ds['TMP_2maboveground'].

rsignell-usgs commented 6 years ago

@ajelenak-thg , yes, it looks like HSDS is dropping the variable attributes: https://gist.github.com/rsignell-usgs/dbe88df42e1181827363a8348016f28b

BTW, you should be able to run this notebook (at least the HSDS and DAP access cells) -- you just need a username and password for this XSEDE endpoint from @jreadey in your ~/.hscfg, right?

If you sign up for XSEDE I can add you to my project, in case that becomes useful later on.

ghost commented 6 years ago

Seems like some attributes of the time coordinate got lost "in translation" to HSDS. According to the DAS response from the THREDDS server:

    time {
        String units "seconds since 1970-01-01 00:00:00.0 0:00";
        String long_name "verification time generated by wgrib2 function verftime()";
        Float64 reference_time 1.4832288E9;
        Int32 reference_time_type 0;
        String reference_date "2017.01.01 00:00:00 UTC";
        String reference_time_description "kind of product unclear, reference date is variable, min found reference date is given";
        String time_step_setting "auto";
        Float64 time_step 3600.0;
        Int32 _ChunkSizes 512;
    }

_ChunkSizes does not really exist as an attribute, I think, because netCDF tools typically display HDF5 dataset creation properties as system attributes (prefixed with _).

The HSDS response about the attributes of the time coordinate shows only these (HDF5 dimension scale-related attributes not included): _Netcdf4Dimid, reference_time, reference_time_type, time_step. No units attribute so no conversion to datetime.

rsignell-usgs commented 6 years ago

@ajelenak-thg and @jreadey, yes, HSDS is losing nearly all variable attributes!

The variable in the original NC file has attributes:

       float TMP_2maboveground(time, latitude, longitude) ;
                TMP_2maboveground:_FillValue = 9.999e+20f ;
                TMP_2maboveground:least_significant_digit = 2 ;
                TMP_2maboveground:short_name = "TMP_2maboveground" ;
                TMP_2maboveground:long_name = "Temperature" ;
                TMP_2maboveground:level = "2 m above ground" ;
                TMP_2maboveground:units = "K" ;

while in HSDS, the only remaining attribute is:

Attributes:
    least_significant_digit:  [2]

Does this mean perhaps that HSDS is only handing attributes with integer values or something?

jreadey commented 6 years ago

@rsignell-usgs - do you see any errors during the import (with hsload)?

I've seen this issue: https://github.com/h5py/h5py/issues/719 come up when loading NetCDF files.

rsignell-usgs commented 6 years ago

Oh yes, I got tons of errors on hsload.

Looks like the real problem is here: https://github.com/h5py/h5py/issues/719#issuecomment-238070297 :

The issue here is that recent versions of netCDF-C save the NC_CHAR dtype as fixed length UTF8 strings, which h5py cannot read.

So maybe hsload could translate NC_CHAR dtypes into something that h5pyd can read?

ghost commented 6 years ago

I don't know if it is possible to get the bytes for such attributes somehow and avoid h5py until that issue is resolved.

ghost commented 6 years ago

I have just created a PR with a fix for this problem: h5py/h5py#988. It works for the netCDF file used here. Let's what happens.