HDFGroup / hdf5-json

Specification and tools for representing HDF5 in JSON
https://support.hdfgroup.org/documentation/hdf5-json/latest/
Other
73 stars 25 forks source link

Can we allow H5S_UNLIMITED or 0 in dims_array? #18

Closed hyoklee closed 9 years ago

hyoklee commented 9 years ago

dims_array can't specify unlimited while maxdims_array can.

dims_array ::= positive_integer_array maxdims_array ::= "[" maxdims_list "]" maxdims_list ::= maxdim ("," maxdim)* maxdim ::= positive_integer | "H5S_UNLIMITED"

Why not allowing H5S_UNLIMITED in dims_array? How can I handle a case like below in h5json:

 netcdf foo {    // example netCDF specification in CDL

 dimensions:
 lat = 10, lon = 5, time = unlimited;

 variables:
   int     lat(lat), lon(lon), time(time);
   float   z(time,lat,lon), t(time,lat,lon);
   double  p(time,lat,lon);
   int     rh(time,lat,lon);

   lat:units = "degrees_north";
   lon:units = "degrees_east";
   time:units = "seconds";
   z:units = "meters";
   z:valid_range = 0., 5000.;
   p:_FillValue = -9999.;
   rh:_FillValue = -1;

 data:
   lat   = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
   lon   = -140, -118, -96, -84, -52;
 }

time is defined but doesn't contain any data (i.e., 0 dimension).

ghost commented 9 years ago

It is possible to create datasets with dimension size of 0 so dims_array should allow zero as a value.

gheber commented 9 years ago

Joe, I have to think re: 0. Obviously, it's not meaningful in HDF5. It's just another way of saying that a dataset can't have a value. How many different ways of saying, "This dataset can't have a value.", do we want to support? H5S_UNLIMITED is not a valid dimension under ANY circumstances, so this is a definite NO. (It is a valid extent for a maximum dimension.)

gheber commented 9 years ago

What's the use case?

hyoklee commented 9 years ago

Use case is for aggregation. User can specify a design like:

time[0] lat[180] lon[360] temperature[0][180][360]

when the user is not sure how many records he will put under time at the time of design and wants to indicate that time dimension is unlimited. They can aggregate data along time dimension later.

This will be very common use case.

gheber commented 9 years ago

I see. Why not

time[] lat[180] lon[360] temperature[][180][360]

or

time[?] lat[180] lon[360] temperature[?][180][360]

Maybe we are mixing things here a bit. In my mind, HDF5/JSON is a language to describe "what there is" and NOT to describe "what there might be" or "what users don't know." I think the knock-on effect of introducing strange conventions for expressing what people don't know on tools and their complexity could be disastrous. I'm hesitant to shoehorn things into HDF5/JSON whose ramifications we don't understand.

ghost commented 9 years ago

HDF5/JSON is a language to describe "what there is"

Since it is possible to create a dataset with dim size of zero why not allow that value in dims_array? Such a dataset would be described with: dims = [0] and maxdims = ['H5S_UNLIMITED'].

Jeff Lee in his original Product Designer was creating all datasets like that in template files and then later would add data to them.

gheber commented 9 years ago

Fine w/ me.

jreadey commented 9 years ago

0 values are supported in jsontoh5.py. See sample json file: data/json/resizable.json. I added two additional test datasets "unlimited_1d_zero" and "unlimited_2d_zero" with 0 dimension values.