Closed maximecarre1 closed 2 years ago
@maximecarre1 I noticed that axisNames
is not present in S111, could you include that in the template for this PR?
@princessmittens
When I try to parse featureCode in S111 it gives me an error: ValueError: Illegal slicing argument for scalar dataspace
Yes right the featureCode definition should be:
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): "SurfaceCurrent"
}
As it is in the DCF8_009_104CA0024900N12400W_production.h5 file template.
Speaking of templates:
Why do we have two DCF8_009_104CA0024900N12400W* files in the IWLS_pygeoapi/templates directory?
Speaking of templates:
Why do we have two DCF8_009_104CA0024900N12400W* files in the IWLS_pygeoapi/templates directory?
DCF8_009_104CA0024900N12400W is the full template, it's only there as a reference. DCF8_009_104CA0024900N12400W_production is the production template without the WaterLevel.0X groups, because they get created dynamically to account for the different availability of observations/forecasts/predictions for each stations.
@maximecarre1
Optional fix: I'm not sure if we want to put this in this issue or we can create another issue for it if you like.
Currently 'verticalDatumReference` is of type string. It should be an enum (int)
Referencing the S104 spec (pg. 77):
31 Vertical datum
reference verticalDatumReference 0..1 Enumeration
Mandatory for S-104 only if
verticalCoordinateBase = 2.
1: S-100 vertical datum
2: EPSG
I think you are looking for the 1 (currently it's S-100 vertical datum
as a str).
verticalDatumReference needs to be an enum. horizontalDatumReference could be removed from the S-104 templates, its no longer in the standard as of 1.0.0 and is covered by horizontalCRS. It's in S-111 1.0.0 but not in the draft for the next version; but it's also a mandatory file level attribute in S-100 4.0.0 so we have to decide if we want to conform to S-104 1.0.0 or S-100 4.0.0 or include both attributes (like we are doing now.)
Whoops, typo! Ok perhaps we can come back to this. @glmcr what do you think?
I think that we can drop the "horizontalDatumReference" from our S111 v1.1.1 products because it is not in the v1.1.1 draft spec (and even if it is in the S100 4.0.0 spec) as mentioned by Max.
But for sure we need to keep it for the S111 v1.0.1 products if we have to produce S111 DCF8 v1.0.1 (note that I think that there is no DCF8 type in S111 v1.0.1 spec)
So the next question would probably be: Do we move to S111 v1.1.1 for the S111 DCF8 products right now and forget about the S111 DCF8 v1.0.1 spec version? I would say yes.
I'll let you both decide if we need to create a specific issue about the "verticalDatumReference" that need to be an enum for the S104.
I'll let you both decide if we need to create a specific issue about the "verticalDatumReference" that need to be an enum for the S104.
I can include it with the next commit for this PR, it's a quick fix.
Please forget about what I wrote earlier about the "horizontalDatumReference", after checking the S111 spec v1.1.1 I really think that it should still be used for all S111 products.
"horizontalCRS" is the numeric value code (like the usual 4236) related to the EPSG string identifier but again according to the S104 v1.0.0 spec it seems that we need to have the "horizontalDatumReference" and the "horizontalCRS" in all the S104 DCF types.
It is really confusing because the EPSG:4236 datum is both an horizontal and a vertical datum.
For our S104, we will probably always use the EPSG:4236 for the horizontal datum but for the vertical datum it will be either the lowerLowWaterLargeTide (code 27) for locations where the tidal energy is dominant or the lowestLowWater (code 4) for the inland waters.
So I suggest to keep things as they are now for the horizontalDatumReference issue and we will discuss about it when I will be back from vacations.
horizontalCRS is the numeric value code (like the usual 4236) related to the EPSG string identifier but again according to the S104 v1.0.0 spec it seems that we need to have it in all the S104 DCF types.
It is really confusing because the EPSG:4236 datum is both an horizontal and a vertical datum.
For our S104, we will probably always use the EPSG:4236 for the horizontal datum but for the vertical datum it will be either the lowerLowWaterLargeTide (code 27) for locations where the tidal energy is dominant or the lowestLowWater (code 4) for the inland waters.
So I suggest to keep things as they are now for the horizontalDatumReference issue and we will discuss about it when I will be back from vacations.
Ok, we will keep horizontalDatumReference the way it is for the moment. The vertical component of EPSG:4236 is only really used for raw GPS observations (ellipsoidal heights on the WGS84 ellipsoid) I don't think it's relevant for any S100 products
The vertical component of EPSG:4236 is only really used for raw GPS observations (ellipsoidal heights on the WGS84 ellipsoid) I don't think it's relevant for any S100 products.
Yes I agree Max, but it seems that it still could be used as a vertical datum for S100 products according to the specs.
For sure, it would be possible to not use the horizontalDatumReference attribute at all for the S100 products assuming that the "EPSG" is the default one that goes with the "horizontalCRS" numeric code.
But we would need to know if it will be allowed to do so.
I personally find rather strange that the S100 spec is splitting the horizontal datum id in two parts and not simply using the "EPSG:4326" string combo as it seems to be done almost elsewhere in the GIS world.
It think it was originally done so horizontalDatumValue could be encoded as an integer, but that potentially causes problems if horizontalDatumReference is not EPSG. That's the reason S-104 and the upcoming version of S-111 are moving to use horizontalCRS but horizontalDatumReference is still a mandatory attribute in S-100 4.0.0 so there is a conflict to there.
Ah! Not sure but, if we read carefully the text in the "horizontalCRS" (integer type) entry of the "Table 12.3 - General Metadata, related to the entire HDF5 file" of the S104 v1.0.0 spec p. 75 it seems that the "EPSG" string is considered implicit there so yes it seems that we will probably be able to drop the "horizontalDatumReference" for future versions of the products provided that nothing else than the "EPSG" horiz. datum ref. type is allowed.
Ooops! I closed this MPR by accident, re-opened now.
I will make an issue where we can further address this and other potential changes in relation to the spec.
-Fixed numInstance in S-111 templates -Fixed featureCode in S-111 templates -Converted verticalDatumReference to enum in S-104 templates
No changes required for S111/S104 axisNames and Group_F; it is default behavior for h5py 3.x to read UTF-8 strings as byte strings if they are located inside an numpy array. (like in compound data types) See: https://github.com/h5py/h5py/issues/1751 and https://docs.h5py.org/en/stable/strings.html
As discussed, I looked at both files the enav version with no byte strings vs the pygeoapi version with byte strings and there seems to be no apparent difference when using h5dump. I don't think the byte strings are visible through h5dump, I can only view them through python. For reference the version of h5py I'm using for pygeoapi is 3.6.0 and on the gpsc for the enav versions it's 2.6.0. Here is the output for the enav version:
DATASET "/Group_F/WaterLevel" {
DATATYPE H5T_COMPOUND {
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "code";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "name";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "uom.name";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "fillValue";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "datatype";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "lower";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "upper";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "closure";
}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): {
"waterLevelHeight",
"Water level height",
"meters",
"-9999",
"H5T_NATIVE_FLOAT",
"-99.0",
"99.0",
"closedInterval"
},
(1): {
"waterLevelTrend",
"Water level trend",
"",
"0",
"H5T_ENUM",
"",
"",
""
}
}
ATTRIBUTE "chunking" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "0,0"
}
}
}
}
And for reference the output for the pygeoapi Group_F/WaterLevel:
DATASET "/Group_F/WaterLevel" {
DATATYPE H5T_COMPOUND {
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "code";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "name";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "uom.name";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "fillValue";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "dataType";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "lower";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "upper";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "closure";
}
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): {
"waterLevelHeight",
"Water level height",
"meters",
"-9999",
"H5T_FLOAT",
"-99.99",
"99.99",
"closedInterval"
},
(1): {
"waterLevelTrend",
"Water level trend",
"",
"0",
"H5T_ENUM",
"",
"",
""
},
(2): {
"waterLevelTime",
"Water level time",
"DateTime",
"",
"H5T_STRING",
"19000101T000000Z",
"21500101T000000Z",
"closedInterval"
}
}
}
}
If you run it in python in h5py are you able to see the byte strings?
That's weird, I think compound types strings should default to byte string when using 3.x,: https://docs.h5py.org/en/stable/strings.html
Can you share how you encode dtypes for compounds? For example this is what I'm using for surface currents:
dt = h5py.vlen_dtype(str)
np_surface_current_type = np.dtype([('code',dt), ('name',dt), ('uom.name',dt), ('fillValue',dt), ('dataType',dt), ('lower',dt), ('upper',dt), ('closure',dt)])
I still haven't found the source of the problem that would cause the difference between the enav version and the pygeoAPI for byte strings in compound types. Maybe it is better to merge the existing changes to the template into main now and I'll create a new issue to look into it?
I'll have to take a look at this tomorrow, I'm in the office with the dfo laptop but can't install pip on wsl without root access. I have to wait to get my personal laptop.
I have reviewed the changes for:
-Fixed numInstance in S-111 templates
-Fixed featureCode in S-111 templates
-Converted verticalDatumReference to enum in S-104 template
Everything looks good.
I think you're right, I read a bit more about the string implementation as referenced above (byte strings being the default implementation for variable length strings). Perhaps we can make an issue and address it when Gilles returns. If that's okay with you then I can merge this.
That's ok with me, Il create an issue for the byte strings and you can merge and close issue #12 when you are ready.
S104 Templates -converted byte strings in Group_F compound data type
S111 Templates -converted byte strings in Group_F compound data type -corrected numInstance to numInstances -corrected Group_F/featureCode to SurfaceCurrent
cannot duplicate problem with attributes northBoundLongitude and southBoundLongitude, they are not present in the code or the templates.