CCI-Tools / cate

ESA CCI Toolbox (Cate)
MIT License
50 stars 15 forks source link

KeyError:'time' when try to open .hdf file #514

Closed sleepynicole closed 6 years ago

sleepynicole commented 6 years ago

Expected behavior

I want to open a .hdf file in cate-cli.

Actual behavior

It can't open my .hdf file, KeyError: 'time'.

Steps to reproduce the problem

  1. cate ws new
  2. cate res set swehdf read_netcdf file=/Users/nicolesmac/Documents/paper/ERA_Interim/data/GlobSnow_SWE_L3B_monthly_1_v2.0.hdf

The errors is as following: Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 662, in _construct_dataarray variable = self._variables[name] KeyError: 'time'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 248, in _get_temporal_props bnds = ds['time'].attrs['bounds'] File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 721, in getitem return self._construct_dataarray(key) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 665, in _construct_dataarray self._variables, name, self._level_coords, self.dims) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 74, in _get_virtual_variable ref_var = variables[ref_name] KeyError: 'time'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 662, in _construct_dataarray variable = self._variables[name] KeyError: 'time'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/web/jsonrpchandler.py", line 192, in send_service_method_result result = future.result() File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/_base.py", line 398, in result return self.get_result() File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/_base.py", line 357, in get_result raise self._exception File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/thread.py", line 55, in run result = self.fn(*self.args, self.kwargs) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/web/jsonrpchandler.py", line 271, in call_service_method result = method(method_params, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/webapi/websocket.py", line 283, in set_workspace_resource monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/wsmanag.py", line 321, in set_workspace_resource workspace.execute_workflow(res_name=res_name, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workspace.py", line 583, in execute_workflow self.workflow.invoke_steps(steps, context=self._new_context(), monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 624, in invoke_steps steps[0].invoke(context=context, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 315, in invoke self._invoke_impl(_new_context(context, step=self), monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 977, in _invoke_impl return_value = self._op(monitor=monitor, input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/op.py", line 211, in call return_value = self._wrapped_op(input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/ops/io.py", line 336, in read_netcdf return adjust_temporal_attrs(normalize_op(ds)) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/op.py", line 211, in call return_value = self._wrapped_op(**input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/ops/normalize.py", line 97, in adjust_temporal_attrs return adjust_temporal_attrs_impl(ds) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 222, in adjust_temporal_attrs_impl tempattrs = _get_temporal_props(ds) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 252, in _get_temporal_props time_min = ds['time'].values[0] File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 721, in getitem return self._construct_dataarray(key) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 665, in _construct_dataarray self._variables, name, self._level_coords, self.dims) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 74, in _get_virtual_variable ref_var = variables[ref_name] KeyError: 'time'

cate res: error: set_workspace_resource() call raised exception: "'time'"

Specifications

cate.1.0.0

forman commented 6 years ago

@sleepynicole Thanks for the report! Could you please provide a link to that HDF file?

sleepynicole commented 6 years ago

Norman Fomferra:

  Hello, this is my HDF file.  It’s a snow water equivalent data.http://www.globsnow.info/swe/archive_v2.0/1982/L3B_monthly_SWE_HDF/GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf.gz 

You can have a try , thanks~

Nicole

mzuehlke commented 6 years ago
netcdf GlobSnow_SWE_L3B_monthly_198212_v2.0 {
dimensions:
        fakeDim0 = 721 ;
        fakeDim1 = 721 ;
        fakeDim2 = 721 ;
        fakeDim3 = 721 ;

variables:
        float swe_average(fakeDim0, fakeDim1) ;
        float swe_maximum(fakeDim2, fakeDim3) ;

// global attributes:
                :Data content, field 1 = "Monthly mean Snow Water Equivalent (mm)" ;
                :Data content, field 2 = "Monthly maximum Snow Water Equivalent (mm)" ;
                :Sensor  = "SMMR" ;
                :Data Date  = "yyyymmdd" ;
                :Processing Date = "yyyymmdd" ;
                :Coordinate system  = "Equal-Area Scalable Earth Grid (EASE-Grid) - Northern Hemisphere" ;
                :Latitude range = "35N - 85N" ;
                :Longitude range = "180W - 180E" ;
                :Spatial Resolution  = "25 X 25 sq.km" ;
                :Processing software name = "FMI assimilation algorithm (Pullianen 2006)" ;
                :Processing software version = "v 2.0" ;
                :Processing Organisation = "Finnish Meteorological Institute" ;
                :Landmask  = " GLC-2000 derived land classification mask" ;
                :Landmask version = "v 2.0" ;
                :Mountain mask  = "ETOPO-5 derived mountain mask" ;
                :Mountain mask version  = "v 2.0" ;
                :forest mask name  = "GLC-2000 derived forest mask" ;
                :forest mask version  = "v 2.0" ;
}

The problem arises from the assumption in the code, that a time dimension is present. But once we fix this the dataset is still far away from being CF compliant.

forman commented 6 years ago

Yepp, not CF, but it is a mistake to assume all NetCDFs are CF-compliant. This is therefore a bug.

JanisGailis commented 6 years ago

I disagree. CF compliance is a necessary prerequisite for Cate operations to work with data. If we can not make that assumption, we're down a very deep rabbit hole. We're not making a generic NetCDF viewer, but a climate toolbox, hence it is not an unreasonable assumption on data to be 'Climate Forecast' conventions compliant.

Maybe this can be solved with a plugin. Or at least a better error message.

forman commented 6 years ago

Having a time dimension is not a CF-requirement. If there is one, it'd be nice if providers encode it in an CF-compliant way. However, if Cate is expecting a time dimension in any dataset, this is wrong. Cate should be able to deal with datasets that don't have any time dimension.

forman commented 6 years ago

Nicole's HDF file has no temporal and spatial coordinate information, so several restrictions might apply when working with the different operations. That should be fine. But we should always be able to read it as long as the file isn't corrupt.

Another option would be introduce a parameter force_geo_spatial_cf as the current normalize option does not indicate that the call will fail if there is not time in the file:

image

JanisGailis commented 6 years ago

The purpose of the CF conventions is to require conforming datasets to contain sufficient metadata that they are self-describing in the sense that each variable in the file has an associated description of what it represents, including physical units if appropriate, and that each value can be located in space (relative to earth-based coordinates) and time.

CF conventions

HDF is a very flexible format that can contain whatever in whatever form. It's fine to enable Cate to read whatever files to view them. But once we want to perform any data operations on them, they should conform to the agreed upon Common Data Model, for which we have chosen CF-compliant NetCDF files that can be represented as xarray arrays.

Of course, there can be exceptions to this - such as GeoTIFF masks, etc. But the general rule should stay. In my opinion.

forman commented 6 years ago

It's fine to enable Cate to read whatever files to view them.

As I said.

But once we want to perform any data operations on them, they should conform to the agreed upon Common Data Model, for which we have chosen CF-compliant NetCDF files that can be represented as xarray arrays.

I mostly agree, but not any operations. It is pointless to force CF-compliancy for HDF or GeoTIFF files (or the other formats users & ESA would like to see in Cate). But Cate should be able to "deal" with these and interpret available information to a maximum extend.

JanisGailis commented 6 years ago

Agree regarding GeoTIFF.

But HDF is different. NetCDF is just a subset of HDF to improve machine readability and interoperability. So, each netCDF file is a valid HDF file. An HDF file that conforms to netCDF data model can be opened by any netCDF reader. If we expect netCDF files to be CF compliant, then why not HDF files? But as HDF as a format allows for very arbitrary things, I don't believe we should be expected to 'make it work' for any HDF file out there.

The biggest problem I see is that a very arbitrary file can then be read into our dataset type. But I strongly believe that all datasets should conform to our Common Data Model, so that we can make assumptions on what a dataset is and what it should represent and what metadata it should have when working with it.

So, currently there are HDF files that can readily be read into our dataset and there are HDF files that can not. If we are to make it work on some level for the second group, I believe such files should be read into a different type. Then, if there are operations that really can work with completely arbitrary nested hierarchies, these can be clearly shown on the interface.

JanisGailis commented 6 years ago

The current file is of course not the worst HDF file out there, at least it can be read with xarray and plotted with matplotlib: globsnow It would be nice to be able to at least read it with Cate and create that plot, if nothing else. But this is something we should think about very carefully, as a solution we would choose can potentially have very serious implications across the board.

sleepynicole commented 6 years ago

@JanisGailis Could you tell me how to read this HDF file with xarray and plot it with matplotlib ??

JanisGailis commented 6 years ago

@sleepynicole Sure.

The following Python snippet produces the above plot:

#!/usr/bin/env python3

import xarray as xr
import matplotlib.pyplot as plt

ds = xr.open_dataset('GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf')
ds.swe_average.plot()

plt.show()

You can read more about the xarray library here

sleepynicole commented 6 years ago

@JanisGailis Thank you so much ! It's helpful, besides, I wonder whether there is a method to deal with the Global EASE-Grid data ? Do you know a method?

JanisGailis commented 6 years ago

@sleepynicole I haven't worked with EASE-Grid data myself. However, it seems that GDAL supports working with EASE-Grid data. It has Python bindings, so that's probably where I'd start to poke around. NSDIC GDAL-EASE-grid

That being said, it seems that this grid is used a lot in certain climate sciences, so I believe we should eventually look into supporting it in Cate as well!

sleepynicole commented 6 years ago

Got it. Thanks a lot!! Hope you succeed early ~

kbernat commented 6 years ago

@sleepynicole You can easily open your dataset in current version by disabling data normalization (read_netcdf normalize=False, see code snippet below) in read netcdf step. The same apply to Cate Desktop. http://cate.readthedocs.io/en/latest/api_reference.html?highlight=normalize#cate.ops.normalize

cate ws new
cate res set swehdf read_netcdf file="/home/dev/cate_hdf_test/data/GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf" normalize=False 
cate ws run plot ds=@swehdf var=swe_average file=swehdf.png title="swdhdf demo"

image

@forman @JanisGailis for DS like xr.dataset type we can add attribute and pass information if we verified (CF-compliant) and normalized DS, Then each operation may reject unsupported data sub-types individually.

sleepynicole commented 6 years ago

@kbernat Thanks a lot! I realized it according to your method! It's a pity that cate can not deal with the EASE-Grid data. But it's useful that i can have a look at my data through cate ,thanks!!

forman commented 6 years ago

@kbernat

for DS like xr.dataset type we can add attribute and pass information if we verified (CF-compliant) and normalized DS, Then each operation may reject unsupported data sub-types individually.

I agree. We could then easily let an operation express its requirements on a dataset and in the implementation we would have automatic checks that raise with nice, readable error message.