Closed sleepynicole closed 6 years ago
@sleepynicole Thanks for the report! Could you please provide a link to that HDF file?
Norman Fomferra:
Hello, this is my HDF file. It’s a snow water equivalent data.http://www.globsnow.info/swe/archive_v2.0/1982/L3B_monthly_SWE_HDF/GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf.gz
You can have a try , thanks~
Nicole
netcdf GlobSnow_SWE_L3B_monthly_198212_v2.0 {
dimensions:
fakeDim0 = 721 ;
fakeDim1 = 721 ;
fakeDim2 = 721 ;
fakeDim3 = 721 ;
variables:
float swe_average(fakeDim0, fakeDim1) ;
float swe_maximum(fakeDim2, fakeDim3) ;
// global attributes:
:Data content, field 1 = "Monthly mean Snow Water Equivalent (mm)" ;
:Data content, field 2 = "Monthly maximum Snow Water Equivalent (mm)" ;
:Sensor = "SMMR" ;
:Data Date = "yyyymmdd" ;
:Processing Date = "yyyymmdd" ;
:Coordinate system = "Equal-Area Scalable Earth Grid (EASE-Grid) - Northern Hemisphere" ;
:Latitude range = "35N - 85N" ;
:Longitude range = "180W - 180E" ;
:Spatial Resolution = "25 X 25 sq.km" ;
:Processing software name = "FMI assimilation algorithm (Pullianen 2006)" ;
:Processing software version = "v 2.0" ;
:Processing Organisation = "Finnish Meteorological Institute" ;
:Landmask = " GLC-2000 derived land classification mask" ;
:Landmask version = "v 2.0" ;
:Mountain mask = "ETOPO-5 derived mountain mask" ;
:Mountain mask version = "v 2.0" ;
:forest mask name = "GLC-2000 derived forest mask" ;
:forest mask version = "v 2.0" ;
}
The problem arises from the assumption in the code, that a time
dimension is present.
But once we fix this the dataset is still far away from being CF compliant.
Yepp, not CF, but it is a mistake to assume all NetCDFs are CF-compliant. This is therefore a bug.
I disagree. CF compliance is a necessary prerequisite for Cate operations to work with data. If we can not make that assumption, we're down a very deep rabbit hole. We're not making a generic NetCDF viewer, but a climate toolbox, hence it is not an unreasonable assumption on data to be 'Climate Forecast' conventions compliant.
Maybe this can be solved with a plugin. Or at least a better error message.
Having a time dimension is not a CF-requirement. If there is one, it'd be nice if providers encode it in an CF-compliant way. However, if Cate is expecting a time
dimension in any dataset, this is wrong. Cate should be able to deal with datasets that don't have any time dimension.
Nicole's HDF file has no temporal and spatial coordinate information, so several restrictions might apply when working with the different operations. That should be fine. But we should always be able to read it as long as the file isn't corrupt.
Another option would be introduce a parameter force_geo_spatial_cf
as the current normalize
option does not indicate that the call will fail if there is not time in the file:
The purpose of the CF conventions is to require conforming datasets to contain sufficient metadata that they are self-describing in the sense that each variable in the file has an associated description of what it represents, including physical units if appropriate, and that each value can be located in space (relative to earth-based coordinates) and time.
HDF is a very flexible format that can contain whatever in whatever form. It's fine to enable Cate to read whatever files to view them. But once we want to perform any data operations on them, they should conform to the agreed upon Common Data Model, for which we have chosen CF-compliant NetCDF files that can be represented as xarray arrays.
Of course, there can be exceptions to this - such as GeoTIFF masks, etc. But the general rule should stay. In my opinion.
It's fine to enable Cate to read whatever files to view them.
As I said.
But once we want to perform any data operations on them, they should conform to the agreed upon Common Data Model, for which we have chosen CF-compliant NetCDF files that can be represented as xarray arrays.
I mostly agree, but not any operations. It is pointless to force CF-compliancy for HDF or GeoTIFF files (or the other formats users & ESA would like to see in Cate). But Cate should be able to "deal" with these and interpret available information to a maximum extend.
Agree regarding GeoTIFF.
But HDF is different. NetCDF is just a subset of HDF to improve machine readability and interoperability. So, each netCDF file is a valid HDF file. An HDF file that conforms to netCDF data model can be opened by any netCDF reader. If we expect netCDF files to be CF compliant, then why not HDF files? But as HDF as a format allows for very arbitrary things, I don't believe we should be expected to 'make it work' for any HDF file out there.
The biggest problem I see is that a very arbitrary file can then be read into our dataset
type. But I strongly believe that all datasets
should conform to our Common Data Model, so that we can make assumptions on what a dataset is and what it should represent and what metadata it should have when working with it.
So, currently there are HDF files that can readily be read into our dataset
and there are HDF files that can not. If we are to make it work on some level for the second group, I believe such files should be read into a different type. Then, if there are operations that really can work with completely arbitrary nested hierarchies, these can be clearly shown on the interface.
The current file is of course not the worst HDF file out there, at least it can be read with xarray and plotted with matplotlib: It would be nice to be able to at least read it with Cate and create that plot, if nothing else. But this is something we should think about very carefully, as a solution we would choose can potentially have very serious implications across the board.
@JanisGailis Could you tell me how to read this HDF file with xarray and plot it with matplotlib ??
@sleepynicole Sure.
The following Python snippet produces the above plot:
#!/usr/bin/env python3
import xarray as xr
import matplotlib.pyplot as plt
ds = xr.open_dataset('GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf')
ds.swe_average.plot()
plt.show()
You can read more about the xarray library here
@JanisGailis Thank you so much ! It's helpful, besides, I wonder whether there is a method to deal with the Global EASE-Grid data ? Do you know a method?
@sleepynicole I haven't worked with EASE-Grid data myself. However, it seems that GDAL supports working with EASE-Grid data. It has Python bindings, so that's probably where I'd start to poke around. NSDIC GDAL-EASE-grid
That being said, it seems that this grid is used a lot in certain climate sciences, so I believe we should eventually look into supporting it in Cate as well!
Got it. Thanks a lot!! Hope you succeed early ~
@sleepynicole You can easily open your dataset in current version by disabling data normalization (read_netcdf normalize=False, see code snippet below) in read netcdf step. The same apply to Cate Desktop. http://cate.readthedocs.io/en/latest/api_reference.html?highlight=normalize#cate.ops.normalize
cate ws new
cate res set swehdf read_netcdf file="/home/dev/cate_hdf_test/data/GlobSnow_SWE_L3B_monthly_198212_v2.0.hdf" normalize=False
cate ws run plot ds=@swehdf var=swe_average file=swehdf.png title="swdhdf demo"
@forman @JanisGailis for DS like xr.dataset type we can add attribute and pass information if we verified (CF-compliant) and normalized DS, Then each operation may reject unsupported data sub-types individually.
@kbernat Thanks a lot! I realized it according to your method! It's a pity that cate can not deal with the EASE-Grid data. But it's useful that i can have a look at my data through cate ,thanks!!
@kbernat
for DS like xr.dataset type we can add attribute and pass information if we verified (CF-compliant) and normalized DS, Then each operation may reject unsupported data sub-types individually.
I agree. We could then easily let an operation express its requirements on a dataset and in the implementation we would have automatic checks that raise with nice, readable error message.
Expected behavior
I want to open a .hdf file in cate-cli.
Actual behavior
It can't open my .hdf file, KeyError: 'time'.
Steps to reproduce the problem
The errors is as following: Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 662, in _construct_dataarray variable = self._variables[name] KeyError: 'time'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 248, in _get_temporal_props bnds = ds['time'].attrs['bounds'] File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 721, in getitem return self._construct_dataarray(key) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 665, in _construct_dataarray self._variables, name, self._level_coords, self.dims) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 74, in _get_virtual_variable ref_var = variables[ref_name] KeyError: 'time'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 662, in _construct_dataarray variable = self._variables[name] KeyError: 'time'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/web/jsonrpchandler.py", line 192, in send_service_method_result result = future.result() File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/_base.py", line 398, in result return self.get_result() File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/_base.py", line 357, in get_result raise self._exception File "/Users/nicolesmac/cate/lib/python3.6/concurrent/futures/thread.py", line 55, in run result = self.fn(*self.args, self.kwargs) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/web/jsonrpchandler.py", line 271, in call_service_method result = method(method_params, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/webapi/websocket.py", line 283, in set_workspace_resource monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/wsmanag.py", line 321, in set_workspace_resource workspace.execute_workflow(res_name=res_name, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workspace.py", line 583, in execute_workflow self.workflow.invoke_steps(steps, context=self._new_context(), monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 624, in invoke_steps steps[0].invoke(context=context, monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 315, in invoke self._invoke_impl(_new_context(context, step=self), monitor=monitor) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/workflow.py", line 977, in _invoke_impl return_value = self._op(monitor=monitor, input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/op.py", line 211, in call return_value = self._wrapped_op(input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/ops/io.py", line 336, in read_netcdf return adjust_temporal_attrs(normalize_op(ds)) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/core/op.py", line 211, in call return_value = self._wrapped_op(**input_values) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/ops/normalize.py", line 97, in adjust_temporal_attrs return adjust_temporal_attrs_impl(ds) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 222, in adjust_temporal_attrs_impl tempattrs = _get_temporal_props(ds) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/cate/util/opimpl.py", line 252, in _get_temporal_props time_min = ds['time'].values[0] File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 721, in getitem return self._construct_dataarray(key) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 665, in _construct_dataarray self._variables, name, self._level_coords, self.dims) File "/Users/nicolesmac/cate/lib/python3.6/site-packages/xarray/core/dataset.py", line 74, in _get_virtual_variable ref_var = variables[ref_name] KeyError: 'time'
cate res: error: set_workspace_resource() call raised exception: "'time'"
Specifications
cate.1.0.0