Closed ocefpaf closed 9 years ago
@ocefpaf, indeed, the longitude/latitude range metadata in the OPeNDAP dataset are correct, yet the metadata in the ISO record are incorrect, with east longitude and south latitude set to 0.0:
<gmd:westBoundLongitude>
<gco:Decimal>-118.64013671875</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>0.0</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>0.0</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>33.86117172241211</gco:Decimal>
</gmd:northBoundLatitude>
I checked the bounds on one pacioos time series records ISO record and the bounds seem fine.
PACIOOS is at TDS Version 4.3.20 - 20131125.1409. which is identical to CDIP, but perhaps PACIOOS has a different ncISO version?
@pacioos, do you have any insight on this?
or @lukecampbell / @kknee, I see you somehow got these bounding boxes from east coast CDIP correct: https://github.com/ioos/catalog/issues/129
As documented at the bottom of our ncISO records, PacIOOS is using ncISO Version 2.3. I've noticed ncISO get the bounds wrong before, but for us these are rare cases when the dataset is of type RADIAL (i.e. HFR).
Note that ncISO looks in more than one place for the longitude/latitude range metadata. In addition to looking at CF/NetCDF attributes (e.g. geospatial_lat_min, etc.) it is also looking at the THREDDS catalog metadata (geospatialCoverage), so I would make sure you have it defined in both places to see if that fixes the problem.
Cheers, John
On Tue, Aug 11, 2015 at 2:38 AM, Rich Signell notifications@github.com wrote:
@ocefpaf https://github.com/ocefpaf, indeed, the longitude/latitude range metadata in the OPeNDAP dataset are correct, yet the metadata in the ISO record http://thredds.cdip.ucsd.edu/thredds/iso/cdip/archive/028p1/028p1_d16.nc?catalog=http%3A%2F%2Fthredds.cdip.ucsd.edu%2Fthredds%2Fcatalog%2Fcdip%2Farchive%2F028p1%2Fcatalog.html&dataset=CDIP_Archive%2F028p1%2F028p1_d16.nc are incorrect, with east longitude and south latitude set to 0.0:
<gmd:westBoundLongitude> <gco:Decimal>-118.64013671875</gco:Decimal> </gmd:westBoundLongitude> <gmd:eastBoundLongitude> <gco:Decimal>0.0</gco:Decimal> </gmd:eastBoundLongitude> <gmd:southBoundLatitude> <gco:Decimal>0.0</gco:Decimal> </gmd:southBoundLatitude> <gmd:northBoundLatitude> <gco:Decimal>33.86117172241211</gco:Decimal> </gmd:northBoundLatitude>
I checked the bounds on one pacioos time series records ISO record http://oos.soest.hawaii.edu/thredds/iso/hioos/nss/ns01agg?catalog=http%3A%2F%2Foos.soest.hawaii.edu%2Fthredds%2Fidd%2Fnss_hioos.html&dataset=NS01agg and the bounds seem fine.
PACIOOS is at TDS Version 4.3.20 - 20131125.1409. which is identical to CDIP, but perhaps PACIOOS has a different ncISO version?
@pacioos https://github.com/pacioos, do you have any insight on this?
or @lukecampbell/@kknee https://github.com/kknee, I see you somehow got these bounding boxes from east coast CDIP correct: ioos/catalog#129 https://github.com/ioos/catalog/issues/129
— Reply to this email directly or view it on GitHub https://github.com/ioos/secoora/issues/197#issuecomment-129858443.
@pacioos, CDIP is also using ncISO Version 2.3, so I guess the only difference left is that you duplicated the bounds in the THREDDS metadata and also with attributes inside the dataset, yet CDIP they only specified the bounds with attributes inside the dataset.
But that should be okay, so it sounds like we have a bug in ncISO for when you only specify the attributes.
But do you understand the xsl enough to see why geospatial_lat_min
and geospatial_lon_max
would get set to 0 (or perhaps thats the default value?), when geospatial_lat_max
and geospatial_lat_max
are apparently okay?
https://github.com/Unidata/threddsIso/blob/master/src/main/resources/xsl/nciso/UnidataDD2MI.xsl
@geoneubie, perhaps you see the problem?
@amilan17 or @shane-axiom, how about you guys?
Can you spot the issue in the XSL?
The NcML has the lat and lon described twice.
First as described above
<attribute name="geospatial_lat_min" type="float" value="33.847797"/>
<attribute name="geospatial_lat_max" type="float" value="33.861347"/>
<attribute name="geospatial_lon_min" type="float" value="-118.64079"/>
<attribute name="geospatial_lon_max" type="float" value="-118.62448"/>
Secondly in a CFMetadata group
<group name="CFMetadata">
<attribute name="geospatial_lon_min" value="-118.64013671875" type="float"/>
<attribute name="geospatial_lat_min" value="0.0" type="float"/>
<attribute name="geospatial_lon_max" value="0.0" type="float"/>
<attribute name="geospatial_lat_max" value="33.86117172241211" type="float"/>
<attribute name="geospatial_lon_resolution" value="0.0"/>
<attribute name="geospatial_lat_resolution" value="0.0"/>
</group>
As you can see the CF Metadata attributes are wrong and this is what the XSL is looking at translating first. I'm not sure why these attributes are not correct in the NcML....
Anna,
Very interesting. The NcML service is of course also produced by ncISO, so if the problem is not in the XSL, then in must be in the ncISO java code itself.
Because CDIP is specifying these attributes correctly in the original data as evidenced by the global attributes here: http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html Note there is only one correct specification of the limits.
Note also there is no specification of geospatial bounds in the THREDDS metadata or else we would see the limits here at the dataset level view:
But that should be OKAY!
The NcISO precedence rules https://geo-ide.noaa.gov/wiki/index.php?title=NetCDF_Attribute_Convention_for_Dataset_Discovery#Determining_an_Order_of_Precedence say that UDDC should take precedence over TDS attributes, and since they are not even supplied, everything should be fine.
So I conclude there must be a coding error somewhere in the ncISO java code that this use case has uncovered.
I looked a bit here: https://github.com/Unidata/threddsIso/search?utf8=%E2%9C%93&q=maxLon but couldn't find the problem.
@shane-axiom, can you find it?
searching for 'CFMetadata' might get you closer to where this code is in the java. https://github.com/Unidata/threddsIso/search?utf8=%E2%9C%93&q=CFMetadata&type=Code
Anna
~~Metadata Adds Meaning.~~
Anna.Milan@noaa.gov, 303-497-5099
NOAA National Centers for Environmental Information
formerly NOAA’s National Geophysical Data Center
http://www.ngdc.noaa.gov/metadata/emma
~~~~~~~~
On Wed, Aug 12, 2015 at 11:44 AM, Rich Signell notifications@github.com wrote:
Anna,
Very interesting. The NcML service is of course also produced by ncISO, so if the problem is not in the XSL, then in must be in the ncISO java code itself.
Because CDIP is specifying these attributes correctly in the original data as evidenced by the global attributes here:
http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html Note there is only one correct specification of the limits.
Note also there is no specification of geospatial bounds in the THREDDS metadata or else we would see the limits here at the dataset level view:
But that should be OKAY!
The NcISO precedence rules https://geo-ide.noaa.gov/wiki/index.php?title=NetCDF_Attribute_Convention_for_Dataset_Discovery#Determining_an_Order_of_Precedence say that UDDC should take precedence over TDS attributes, and since they are not even supplied, everything should be fine.
So I conclude there must be a coding error somewhere in the ncISO java code that this use case has uncovered.
I looked a bit here: https://github.com/Unidata/threddsIso/search?utf8=%E2%9C%93&q=maxLon but couldn't find the problem.
@shane-axiom https://github.com/shane-axiom, can you find it?
— Reply to this email directly or view it on GitHub https://github.com/ioos/secoora/issues/197#issuecomment-130387259.
CFMetadata geospatial extent is calculated here:
It doesn't appear to care about the geospatial_min_lat
etc attributes and instead scans through the latitude and longitude variables.
The dataset in question (028p1_d16.nc) does actually have 0 values in both the lat and lon data variables (gpsLatitude and gpsLongitude), and since these don't match the defined _FillValue of -999.99f for those variables the 0 values are respected.
TL;DR the dataset is broken and threddsIso ignores geospatial extent attributes.
@shane-axiom Awesome! Thanks for pinpointing this!!
Interesting, but ncISO should not be reading gpsLongitude
and gpsLatitude
!
Looking at the OPeNDAP access form:
http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html
I see the coordinates
attribute on every variable points to metaDeployLatitude
and metaDeployLongitude
, which do have valid values:
http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.ascii?metaDeployLatitude,metaDeployLongitude
Also, the UDDC attributes should have taken precendence anyway, right @geoneubie? https://geo-ide.noaa.gov/wiki/index.php?title=NetCDF_Attribute_Convention_for_Dataset_Discovery#Determining_an_Order_of_Precedence
FeautreType is not grid for this dataset so ncISO knows it can't get lat long values in the traditional way of calculating them from the CF CoordinateAxes (because the values returned by the NetCDF libraries are not the true lat lon min max values in that case). So ncISO loops through the available variables until it finds one with a standard_name attribute of latitude or longitude. In this instance there are two variables gpsLatitude and metadataDeployLatitude with standard_name latitude and apparently gpsLatitude is found first. I see two quick fixes: 1) Change the standard_name of gpsLatitude variable or 2) Override the default xslt to use the ncml attributes. I can help with 2) if you need information on how to do that. Not sure if there is a clean longer term fix, but I'm open to ideas.
Dave
On Wed, Aug 12, 2015 at 12:21 PM, Rich Signell notifications@github.com wrote:
Interesting, but ncISO should not be reading gpsLongitude and gpsLatitude!
Looking at the OPeNDAP access form:
http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html I see the coordinates attribute on every variable points to metaDeployLatitude and metaDeployLongitude, which do have valid values:
Also, the UDDC attributes should have taken precendence anyway, right @geoneubie https://github.com/geoneubie?
— Reply to this email directly or view it on GitHub https://github.com/ioos/secoora/issues/197#issuecomment-130400374.
Should ncISO check for the existence of geospatial_min_lat
et al. and use those values if present, regardless of whether the dataset is or isn't a grid?
ncISO does check, but the precedence rules favor calculated values over human maintained values (to answer Rich's earlier question). These can be overriden at an individual site by modifying the xslt at the local TDS install.
FWIW I'm in favor of having the values calculated using data in the variable in lieu of the global attributes. In my experience, unless automation is in play, most folks don't update metadata once the base set has been applied.
@lesserwhirls, @dopplershift or @cwardgar, I'm wondering about this @geoneubie statement above:
"FeatureType
is not grid
for this dataset so ncISO knows it can't get lat long values in the traditional way of calculating them from CF CoordinateAxes (because the values returned by the NetCDF libraries are not the true lat lon min max values in that case)."
The FeatureType
for the dataset in question is timeSeries
. Is there a netcdf-java method that gives the correct lon/lat bounding box for a FeatureType='timeSeries'
?
Hi Rich,
The CF feature type timeSeries
maps to the CDM feature type STATION
. A STATION
is expected to be fixed at a single lat/lon location, and so there are no meaningful methods for finding a bounding box. The relevant interface is StationTimeSeriesFeature.
You'll notice that there is a method getBoundingBox()
, but it is inherited from a super-class and doesn't have the semantics you're looking for. In fact, you can see where the value is set here. It's just an "empty" box centered around the station's location.
FWIW, the TRAJECTORY
feature type has a meaningful bounding box.
@cwardgar , so what should the logic be to determine the lon/lat extent of this timeSeries
feature type?
ncISO is just picking the first variables it finds with standard_name
of longitude
and latitude
, which is gpsLatitude
and gpsLongitude
, instead of metaDeployLatitude
and metaDeployLongitude
. The latter are single point values defined in the coordinates
attributes of the data variables, while the gpsLatitude
and gpsLongitude
are the actual gps readings that are a time series, and the values move around a bit with the buoy watch circle.
Arguably the actual bounding box should be the gpsLatitude
and gpsLongitude
should be the actual bounding box, but as you say, one expects only a single lat/lon point for a timeSeries
featureType.
@geoneubie, the CDIP folks (Darren Wright) removed the standard_name
attribute from gpsLongitude
and gpsLatitude
variables on these problem datasets (see
the OPeNDAP URL http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html
for example), but the ISO metadata produced by ncISO is still showing 0.0 for min lat and lon:
Is the logic being used by ncISO different than you thought or do we have some other kind of problem?
@rsignell-usgs, in the CDM, stations (i.e. timeSeries
) are associated with a single lat/lon location, even if it's nominal (so metaDeployLatitude
and metaDeployLongitude
should be what the CDM chooses). We don't have a concept of a buoy that drifts away from an initial location. However, a potential way to solve that would be to subclass StationTimeSeriesFeatureImpl
and then override getBoundingBox()
to return the box around gpsLatitude
and gpsLongitude
.
Of course, that all assumes that ncIso is using NetCDF-Java's PointFeature machinery in the first place, which doesn't seem to be the case.
I cannot fix this issue for the skill-score as it is a data provider issue. A workaround is already in place and my notebooks are ignore these stations. So I am closing this issue.
PS: Please feel free to continue using this thread to discuss the problem though. I am just cleaning the secoora skill score issue tracker.
@lesserwhirls, it seems the fundamental problem here is that nciso is not using the built-in netcdf-java functionality for determining the bounding box. Do you agree?
If you do agree, then perhaps @noaaroland can take a look, with @kevin-obrien approval, of course.
Yes. However, moving the code to use the PointFeature stuff is probably not a super simple task, and it's changed so much in v5.0, that it would probably be worth waiting to make that change to ncIso once it is using v5.0. I do have a branch of ncIso building with the latest snapshot of v5.0 (probably needs updated with latest changes to master), and would happily welcome any feedback on the 5.0 api for point features.
The notebooks uses the
bbox
below.and filters the data search with this
fes
define filter:However, several DAP URLs returned are outside the search box. See https://github.com/ioos/secoora/blob/master/notebooks/timeSeries/sst/2014-07-07/log.txt#L245-L284.
or more specifically http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/archive/028p1/028p1_d16.nc.html
These stations metadata seems to be correct though.
For now I am filtering these stations out with:
where,
That was the fastest route I found to circumvent this issue. Avoiding
xray
andiris
objects by querying for thecdm_data_type
directly with the rawnetCDF4
interface, but it is still slow very slow :unamused:It would be nice to find the source of the problem and fix it. @rsignell-usgs any ideas?