OPENDAP / bes

The BES framework, which forms the basis for the Hyrax server
GNU Lesser General Public License v3.0
13 stars 19 forks source link

CoverageJSON output invalid #215

Closed letmaik closed 3 years ago

letmaik commented 5 years ago

This refers to http://test.opendap.org/opendap/AIRS/QC_AIRS-1.2011.02.13.090.L2.RetStd.v5.2.2.0.G11045052855.hdf.covjson.

As far as I see, all the "parameters" listed in the result, e.g. AIRSXTrack or StdPressureLay are not parameters but should be domain axes. The actual parameters, looking at http://test.opendap.org/opendap/AIRS/QC_AIRS-1.2011.02.13.090.L2.RetStd.v5.2.2.0.G11045052855.hdf.html, like O3_Resid_Ratio or TCldTopStd are all missing.

Just looking at one of the ranges objects of the output makes this mismatch obvious:

      "AIRSXTrack": {
        "type": "NdArray",
        "dataType": "integer", 
        "axisNames": ["y", "x"],
        "shape": [3],
        "values": [0, 1, 2]
      },

Here the axes are given as y and x (which are defined correctly earlier in the "domain" part), however the shape and values do not match that.

@lewismc @jonblower

lewismc commented 5 years ago

@letmaik yes the bug(s) are/is glaringly obvious here. It would appear however that the CovJSON document syntax for CoverageCollection is fine as per the specification description.

As far as I understand it we need to determine the following

  1. When parameters should be included within the coverage collection object (note that inclusion here is optional, denoted by MAY within the specification), and
  2. What conditions exist for granules variables (encoded as CovJSON ranges objects to be essentially ignored altogether.
  3. As you mention above @letmaik each of the ranges objects' values seem to be an incremental counter for the size of the shape this is entirely incorrect.

There is some nice debugging logging we can use to explicitly print the shape constraints for the no3 issue above.

lewismc commented 5 years ago

It's probably relevant for me to note @letmaik thanks for reviewing this work. I think if we tested the CovJSON responses against more complex granules such as this AIRS one previously the implementation would be hardened. This is very valuable feedback... AIRS is not a dataset I have been working with so it looks like it presents more complex challenges.

lewismc commented 5 years ago

Actually, I think we should definitely use this granule as a unit test. It should be added to the test data and we should add a note to the README documentation regarding how we can add tests as it is not intuitive at this point in time.

letmaik commented 5 years ago

The CoverageCollection structure is fine, yes. Also, the domain of the only coverage in that collection seems ok, but then everything else doesn't make sense. Can you provide some URLs of CovJSON test resources that are fine in your opinion? Just want to have a quick look to rule out (or find) systematic errors without digging into the code.

By the way, I only looked at this test dataset because that's the one you referred to for advertising the CovJSON support of Hyrax in the "Status?" issue of the specification repo of CovJSON.

lewismc commented 5 years ago

At PO.DAAC we are not yet running 1.15 in production so I can’t provide you with a public URL.

What about the following... which is also incorrect! It does not include several variables from the granule...

test.opendap.org:8080/opendap/ioos/200803061600_HFRadar_USEGC_6km_rtv_SIO.ncml.covjson

lewismc commented 5 years ago

I’ve also found the following error

test.opendap.org:8080/opendap/larc/MISR_AM1_CGAS_MAR_04_2006_SITE_INTEXB_F06_0021.hdf.covjson

@jgallagher59701

lewismc commented 5 years ago

Scope this Grid coverage out

test.opendap.org:8080/opendap/Pathfinder/Northwest_Atlantic/1km/raw/2001/4/f01091084047.hdf.covjson

letmaik commented 5 years ago

Scope this Grid coverage out

test.opendap.org:8080/opendap/Pathfinder/Northwest_Atlantic/1km/raw/2001/4/f01091084047.hdf.covjson

Better, but also invalid:

Most of these things can be easily checked by copy-pasting the JSON (if it's not too big) into the CovJSON playground and seeing if it displays something useful.

lewismc commented 5 years ago

The primary issue here is that we don’t have a formal test suite for CovJSON. We have discussed this recently and I think it’s somethhing we should work on producing. All of the granules I’ve pasted above have been entirely random selections so it looks like, unfortunately, we have some work to do to ensure that the cases highlighted above at least are addressed.

lewismc commented 5 years ago

@letmaik @jonblower would either of you be interested in assisting me write a test suite for CovJSON implementations this coming GSoC? We could easily facilitate it through the @ESIPFed at https://github.com/ESIPFed/gsoc/issues

jonblower commented 5 years ago

Yes, I could help at some level. It would be great to have the test suite implemented as a kind of "linter", to help people check validity. We started work on this a while ago but didn't get very far, so doing this as GSoC sounds great.

lewismc commented 5 years ago

OK cool @jonblower I'll log a GSoC project idea over on the ESIP GSoC project issues and tag you there. Thanks

hemphilc commented 4 years ago

In regards to the comment "As far as I see, all the "parameters" listed in the result, e.g. AIRSXTrack or StdPressureLay are not parameters but should be domain axes."

It is extremely difficult to know of all the different names of potential domain axes in these datasets largely because the naming conventions are so varied. Right now, the module searches for the following naming conventions:

I'll be the first to admit that this hardly covers everything and could be expanded. But as a person who does not necessarily work with these datasets on a regular basis, its difficult for me to catch everything without having some kind of reference or key. I will look into expanding the domain axes detection as soon as possible.

jonblower commented 4 years ago

Yes - if the data follow the CF conventions, then there are definitive ways for detecting domain axes. But if not, I guess there's no choice but to make an educated guess