FAIRiCUBE / FAIRiCUBE-Hub-issue-tracker

FAIRiCUBE HUB issue tracker
Creative Commons Zero v1.0 Universal
0 stars 1 forks source link

Validity of XML from rasdaman #26

Open KathiSchleidt opened 9 months ago

KathiSchleidt commented 9 months ago

@pebau we've got regular issues with the validity of the XML being provided by rasdaman instances.

As an example, see CoverageDescription for near_surface_air_temperature, has an impressive error log :(

When can we expect valid XML from rasdaman?

KathiSchleidt commented 9 months ago

@pebau any updates?

KathiSchleidt commented 7 months ago

I just checked the 2 versions of the LGN dataset currently being provided, neither DescribeCoverage response is valid XML

Issues on both:

Issues on Coverage ID LGN: You provide ISO-Date strings in fields expecting a double value in several places:

Issues on Coverage ID LGN_virtual_coverage_index: wcs:ServiceParameters are in the wrong order (sequence order is mandatory in XML), must be provided before

pebau commented 7 months ago

@KathiSchleidt thanks for checking, Kathi!

@Mohinem correct those, and check the others as well for these issues.

pebau commented 7 months ago

@KathiSchleidt re this below:

Issues on Coverage ID LGN: You provide ISO-Date strings in fields expecting a double value in several places:

This is an issue OGC never resolved. Of course nobody wans "seconds since epoch", but the well known ISO 8601 syntax. Since at least 2021 I have been picking on the CRS.SWG to allow string-valued coordinates, but they refused for many years. Only with 19111:2019 they showed a broadened mind - but that was too late for CIS 1.0, and anyway GML never made a corresponding adjustment.

Hence, everybody doing timeseries in CIS 1.0 consciously violates the schema.

How can you enforce CIS 1.1 output? In rasdaman we have added In CIS 1.1 I have defined a direct position as a sequence of anySimpleType.

For WCS Describe/GetCoverage request, it means to use version 2.1.0 with the extra output:

...&service=WCS& version=2.1.0& request=DescribeCoverage& coverageId=test_mr& outputType=GeneralGridCoverage

For WCPS, it means to use something like:

for c in (test_irr_cube_2) return encode( c[ansi("2008-01-01T02:01:20.000Z"), E(75042.72735943:85042.72735943), N(5094865.55794:5099865.55794)], "gml", "{\"outputType\":\"GeneralGridCoverage\"}")

KathiSchleidt commented 7 months ago

Am I correct in my understanding that temporal dimensions cannot be correctly provided under WCS 2.0?

I'd love to try the WCS 2.1 version, but have not been able to provide my credentials with the URI (seem not to be accepted), the GUI provided by rasdaman uses WCS 2.0.

In addition, while one of the errors I mentioned pertain to the issue with providing an ISO Timestamp in a double field, there were several other issues that have not been addressed.

Guess we're stuck with invalid XML

bangph commented 7 months ago

@KathiSchleidt I've updated petascope on https://fairicube.rasdaman.com/rasdaman/ows

For CIS 1.0:

The WSClient now has CIS 1.1 by default and you can select CIS 1.0 as well in WCS DescribeCoverage / GetCoverage in GML format).

KathiSchleidt commented 7 months ago

@bangph many thanks! A few more validation issues:

Full valid example for grassland_change_2018_index

KathiSchleidt commented 7 months ago

@bangph I just checked the GetCoverage response for WCS 2.1, here the XML errors:

KathiSchleidt commented 7 months ago

@bangph I just checked the GetCapabilities response for WCS (GUI still providing WCS 2.0), here the XML errors:

bangph commented 7 months ago

@KathiSchleidt

Integration of INSPIRE Extended capabilities, but missing elements

Thanks for checking, please specify which missing elements are when you saw, otherwise I don't know what to with it.

the GUI has been changed to WCS 2.1, but the Capabilities still are provided in WCS 2.0

GUI never changed completely to WCS 2.1 , it sets the default CIS 1.1 in WCS DescribeCoverage / GetCoverage tabs with the dropdown to select WCS 2.0.1.

@pebau Do you want to set in WCS GetCapabilities as well?

KathiSchleidt commented 7 months ago

@bangph I now also checked the GetCapabilities under WCS 2.1, seems this isn't even defined, at least I cannot find it in the WCS Schemas provided by OGC at https://schemas.opengis.net/wcs/2.1/gml/

Please clarify where the correct schema for this has been hidden

KathiSchleidt commented 7 months ago

@bangph on the INSPIRE extended capabilities, for the moment I'd just leave this out as not required. Fear we have TO MANY other validation issues for me to provide these details at present

bangph commented 7 months ago

@KathiSchleidt I don't have any experience with WCS 2.1 XML validation. I have for WCS 2.0.1 because I needed to make it pass OGC CITE test. You would need to contact people from OGC for the schema to validate which is your expertiste.

KathiSchleidt commented 7 months ago

@bangph do you have any XML validation tools available? I've yet to access any rasdaman response that is fully valid :(

As for providing support for the correct provision of data under WCS2.1, while I'm happy to help, this goes way beyond what I can provide via FAIRiCUBE, some alternative needs to be found

pebau commented 7 months ago

@bangph I now also checked the GetCapabilities under WCS 2.1, seems this isn't even defined, at least I cannot find it in the WCS Schemas provided by OGC at https://schemas.opengis.net/wcs/2.1/gml/

Please clarify where the correct schema for this has been hidden

OWS Common ?

bangph commented 7 months ago

@KathiSchleidt I've some tools, e.g. http://www.eisenhutinformatik.ch/gml/gmlcheck/ (I used it mostly) or even with XMLSpy for validating against WCS 2.0.1 schema (no idea with WCS 2.1 as I never tried).

KathiSchleidt commented 7 months ago

@bangph Ideally one uses multiple XML validators, as from experience the validators also have validation issues ;) I use XML Spy, so theoretically you should have found all the issues I've been highlighting. Sad note: I have yet to get a valid response from a rasdaman endpoint (this goes way beyond FAIRiCUBE) :(

bangph commented 7 months ago

@KathiSchleidt For WCS 2.0.1 you would not get any valid response because of the datetime in ISO format. There is no 2D RectifiedGridCoverage coverage on Fairicube so you cannot get a result for DescribeCoverage which has no validation error.

KathiSchleidt commented 7 months ago

But, the WCS 2.1.0 options are also all not valid :(

bangph commented 7 months ago

@KathiSchleidt

the WCS 2.1.0 options are also all not valid

I've not looked in WCS 2.1.0 schema validation yet, but can you confirm that you tested with this https://schemas.opengis.net/wcs/2.1/gml/?

I found a 2D RectifiedGridCoverage coverage on Fairicube

curl -u username:passwd 'https://fairicube.rasdaman.com/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=DescribeCoverage&COVERAGEID=sentinel2_2018_flevopolder_10m_7x4bands' -o DescribeCoverage.xml

Then please validate theDescribeCoverage.xml to WCS 2.0.1 schema to have a valid response.

KathiSchleidt commented 7 months ago

@bangph cool! First valid XML I've gotten!!! At the same time, seems to be the only dataset ingested by a UC partner, take a look at the metadata, I get the feeling Rob ingested this on his own.

Question - I've never understood why one didn't use 2D RectifiedGridCoverage on all the datasets that only provide data for a single year. Can you explain the rationale behind this decision of providing a temporal dimension with extent 1 year?

bangph commented 7 months ago

@KathiSchleidt Because the data is in a time series, for example, you would have multiple 2D coverages:

but they can be combined nicely as a 3D ReferenceableGridCoverage coverage with time coefficients (irregular time axis is my favorite).

If you have only 1 file and there is nothing else, even if the file is produced for a year for an interval e.g. 2006-2020, then it should be 2D RectifiedGridCoverage.

KathiSchleidt commented 7 months ago

@bangph but right now we have both! Example:

bangph commented 7 months ago

@KathiSchleidt 4 coverages which you are listed are all 3D ReferenceableGridCoverage. Technically, source coverages for the 4th virtual coverage like 2012, 2015, 2018 would be hidden so you wouldn't see them.

It would not be possible to create the "virtual coverage" if the source coverages are 2D (they must have the same axes).

KathiSchleidt commented 7 months ago

@bangph so the _coverage_index Coverages rely on the individual year Coverages? At least based on the size, it looks like a copy. Or does this virtual Coverage show the data volume for the contained Coverages?

Also - I'm concerned as while the different _index coverages have the same axes, these axes do not have the same resolution over the years. The complete _coverage_index Coverage has the lowest resolution of the contained years

bangph commented 7 months ago

@KathiSchleidt the virtual coverage contains the source coverage (_year_index). have a look in the DescribeCoverage in WCS 2.0.1 of this coverage dominant_leaf_type_virtual_coverage_index below:

<PartitionSet>
                        <Partition>
                            <CoverageRef>dominant_leaf_type_2012_index</CoverageRef>
                        </Partition>
                        <Partition>
                            <CoverageRef>dominant_leaf_type_2015_index</CoverageRef>
                        </Partition>
                        <Partition>
                            <CoverageRef>dominant_leaf_type_2018_index</CoverageRef>
                        </Partition>
                    </PartitionSet>

In the coverage example dominant_leaf_type_virtual_coverage_index it has time axis (not time CRS but Index1D CRS), other source coverages: 2015_index,... also have the same axis. This axis is irregular (irregular axis has no resolution, only regular axis has resolution).

pebau commented 7 months ago

on a side note, this approach has been presented in depth at the Girona meeting, and details can also be found in Deliverable D5.something.

KathiSchleidt commented 7 months ago

@bangph the <PartitionSet> is one of the bits of the XML that is invalid, no schema provided. Details I interpreted the content to mean that the listed Partitions are the source of the individual years. What I don't yet understand (and haven't found in the deliverables) is:

pebau commented 7 months ago
  Reason: The following elements are expected at this location (see below)
  '{##any except ##local [http://www.opengis.net/cis/1.1/gml}](http://www.opengis.net/cis/1.1/gml%7D)'

small fix: URL is http://schemas.opengis.net/cis/1.1/gml/

pebau commented 7 months ago

@bangph the <PartitionSet> is one of the bits of the XML that is invalid, no schema provided. Details I interpreted the content to mean that the listed Partitions are the source of the individual years. What I don't yet understand (and haven't found in the deliverables) is:

* If the virtual_coverage_index is a copy of the individual coverages listed in the PartitionSet, or references these

reference; this is in the rasdaman documentation. But this is not essential for working with it - for users we always try to push as much as technical detail as possible "behind the curtain".

bangph commented 7 months ago

@KathiSchleidt

If the virtual_coverage_index is a copy of the individual coverages listed in the PartitionSet, or references these

It is the umbrella has the references to the source coverages (_year_index).

How the conversion between different resolutions is performed

You wanted to ask about lat/lon resolutions. It would be better when you mentioned about which axis label. If it is like that, then it is done by gdal library (in rasql it is called projection() operator) - and in petascope it does the hard work behind the scene to generate the rasql query.

KathiSchleidt commented 7 months ago

@pebau The error text I posted was provided by XMLSpy As for the correct URI, in the schema itself, the namespace URI is defined as xmlns:wcs21="http://www.opengis.net/wcs/2.1/gml" in the official schema, no final /

@bangph I'm giving up on my question on if the virtual coverage is a copy or a reference to the parts from the PartitionSet As for the conversion performed, axis label Y X, spatial resolution seems to have been resampled from a 20m source to 10m. Resampling without clear understanding of the nature of the underlying data leads to errors (for details, see Manuel's presentation on Grids and Resampling). At the very least, the resampling method should be documented with the data.

pebau commented 7 months ago

@pebau The error text I posted was provided by XMLSpy As for the correct URI, in the schema itself, the namespace URI is defined as xmlns:wcs21="http://www.opengis.net/wcs/2.1/gml" in the official schema, no final /

hm, "www" instead of "schemas" may be wrong indeed, we need to check - thanks for spotting this! And yes, my trailing "/" should not be there.

@bangph pls collect all such issues so that we can work it off once the dust has settled.

bangph commented 7 months ago

@KathiSchleidt I'm well aware of resampling, the default interpolation is nearest neighbor. Because the data is provided in different resolutions (20m and 10m for some years) so the virtual coverage must upscale 20m from source coverage to 10m. If one wants to have select the 20m coverage then go with the source coverage instead of slicing it from the virtual coverage.

KathiSchleidt commented 7 months ago

@pebau fear you're confusing namespace (http://www.opengis.net/...) with schemalocation (https://schemas.opengis.net/...)

@bangph I understand the issue of the different spatial resolutions over time, we've been discussing this for the last many months. However, for correct data provision, such resampling must be documented. And while nearest neighbor mostly works for datasets the datasets currently available, it can also go badly wrong.

pebau commented 7 months ago

@pebau fear you're confusing namespace (http://www.opengis.net/...) with schemalocation (https://schemas.opengis.net/...)

oops, could be - late in the evening, and years after I tortured XML schema. Or schema me...

bangph commented 7 months ago

@KathiSchleidt I suggest you open another ticket for any irrelevant discussion to GML from this thread. It has many mixed up discussion which has nothing to do with XML validation.

bangph commented 7 months ago

@KathiSchleidt https://fairicube.rasdaman.com/rasdaman/ows updated. Now it has:

KathiSchleidt commented 7 months ago

@bangph Many thanks!!!

I just checked the GetCapabilities, in addition to the issue with the provision of ISO dateTime, there seems to be an issue with the /wcs:Capabilities/ows:OperationsMetadata/ows:ExtendedCapabilities/inspire_dls:ExtendedCapabilities section. Please check and assure that valid information is being provided. I find it worrisome that WCS 2.1.0 encoding is not fully specified :( Any outlook when this will be finalized?

DescribeCoverage and GetCoverage looks good, I just checked with the corine_land_cover_virtual_coverage_index dataset. I'll check these responses again once we get the RangeType information correctly provided, as well as the link to the STAC metadata record.

Conclusion: DescribeCoverage and GetCoverage are now valid, whether we'll get a valid GetCapabilities is up in the air.

Btw - I checked the online validator you proposed, but this provided very different feedback. To my experience, it's best to utilize multiple validators, as each one catches different aspects. Online validator output for GetCapabilities:

    S4s-elt-character: Non-whitespace Characters Are Not Allowed In Schema Elements Other Than 'xs:appinfo' And 'xs:documentation'. Saw '301 Moved Permanently'., Line '2', Column '35'.
    S4s-elt-character: Non-whitespace Characters Are Not Allowed In Schema Elements Other Than 'xs:appinfo' And 'xs:documentation'. Saw '301 Moved Permanently'., Line '4', Column '34'.
    S4s-elt-character: Non-whitespace Characters Are Not Allowed In Schema Elements Other Than 'xs:appinfo' And 'xs:documentation'. Saw 'CloudFront'., Line '5', Column '23'.
    The Element Type "hr" Must Be Terminated By The Matching End-tag "</hr>"., Line '6', Column '3'.
    The Element Type "hr" Must Be Terminated By The Matching End-tag "</hr>".
bangph commented 7 months ago

@KathiSchleidt thanks for your checks.

I find it worrisome that WCS 2.1.0 encoding is not fully specified :( Any outlook when this will be finalized?

Unfortunately, I cannot answer that, it will take long time from OGC.

I've updated https://fairicube.rasdaman.com/rasdaman/ows and indeed your experience with different XML validation tools gave me a good hint here to fix WCS GetCapabilities result.

correct ones are:

xmlns:inspire_dls="https://inspire.ec.europa.eu/schemas/inspire_dls/1.0/inspire_dls.xsd" xmlns:inspire_common="https://inspire.ec.europa.eu/schemas/common/1.0/common.xsd"
KathiSchleidt commented 7 months ago

@bangph fear you made an error with the inspire_dls namespace, should be "http://inspire.ec.europa.eu/schemas/inspire_dls/1.0", you're providing the schema location instead of the namespace. The effect is that the inspire_dls schema is no longer loaded because under schemaLocation, the reference is still to the correct namespace http://inspire.ec.europa.eu/schemas/inspire_dls/1.0, but this is not associated with the inspire_dls namespace. Nice trick to avoid the underlying issue, but not a solution.

I've now done the necessary analysis for you, you're missing the inspire_dls:SpatialDataSetIdentifier entry. The full /inspire_dls:ExtendedCapabilities section should be as follows:

<inspire_dls:ExtendedCapabilities>
    <inspire_common:MetadataUrl>
        <inspire_common:URL>https://fairicube.rasdaman.com/rasdaman/ows</inspire_common:URL>
        <inspire_common:MediaType>application/vnd.iso.19139+xml</inspire_common:MediaType>
    </inspire_common:MetadataUrl>
    <inspire_common:SupportedLanguages>
        <inspire_common:DefaultLanguage>
            <inspire_common:Language>eng</inspire_common:Language>
        </inspire_common:DefaultLanguage>
    </inspire_common:SupportedLanguages>
    <inspire_common:ResponseLanguage>
        <inspire_common:Language>eng</inspire_common:Language>
    </inspire_common:ResponseLanguage>
    <inspire_dls:SpatialDataSetIdentifier>
        <inspire_common:Code>FAIRiCUBE</inspire_common:Code>
    </inspire_dls:SpatialDataSetIdentifier>
</inspire_dls:ExtendedCapabilities>
bangph commented 7 months ago

@KathiSchleidt thanks for correcting me with namespace and schemaLocation URLs.

For your example, I don't think it is just like what you've posted below (it is just to bypass XMLSpy validation).

<inspire_dls:SpatialDataSetIdentifier>
        <inspire_common:Code>FAIRiCUBE</inspire_common:Code>
    </inspire_dls:SpatialDataSetIdentifier>

Because, this section is used to list the INSPIRE coverages only (on Fairicube you have none).

A valid example should contain Code and Namespace of an INSPIRE coverage like below:

 <inspire_dls:SpatialDataSetIdentifier metadataURL="https://www.nationaalgeoregister.nl/geonetwork/srv/api/records/15d0b9b0-1067-4ef6-a214-77526e8e8750/formatters/xml">
                    <inspire_common:Code>INSPIRE_WNZ_5_NAP</inspire_common:Code>
                    <inspire_common:Namespace>http://inspire.rasdaman.org/rasdaman/ows</inspire_common:Namespace>
  </inspire_dls:SpatialDataSetIdentifier>

Am I wrong?

pebau commented 7 months ago

@KathiSchleidt thanks for your checks.

I find it worrisome that WCS 2.1.0 encoding is not fully specified :( Any outlook when this will be finalized?

Unfortunately, I cannot answer that, it will take long time from OGC.

infinitely. The Capabilities document is derived from OWS Common, and the derivatino mechanisms of XML do not allow to extend types from number to string. OWS Common is abandoned in the sense that there is no SWG to work on it. And anyway OGC will not want to fix OWS Common (which is known to have several flaws) as they want to sell OAPI now and it is against this interest to improve OWS Common. In short: no change will happen.

KathiSchleidt commented 7 months ago

@bangph While the coverages we provide go way beyond what's been defined in INSPIRE, as the DGGS will be building on INSPIRE, it probably doesn't hurt to leave in these INSPIRE specific parts. I wouldn't bother to list the datasets, especially as they're beyond INSPIRE anyway, thus my shortened version.

Btw - I've now found a compliant rasdaman based endpoint, NL elevation data: https://coverage.wetransform.eu/rws/hoogte_nl_1m/2023-08/ows#/services

As we're discussing INSPIRE, any reason I've been locked out of https://inspire.rasdaman.org/rasdaman/ows?

bangph commented 7 months ago

@KathiSchleidt ok, so every service will have a new setting called inspire_dls_spatial_dataset_identifier in petascope.properties when one can define the name of the service, here it is FAIRiCUBE

<inspire_dls:SpatialDataSetIdentifier>
        <inspire_common:Code>FAIRiCUBE</inspire_common:Code>
    </inspire_dls:SpatialDataSetIdentifier>

As we're discussing INSPIRE, any reason I've been locked out of https://inspire.rasdaman.org/rasdaman/ows?

Please always post the request and output you tried, no one knows what you meant without the context, this service is public accessible.

bangph commented 7 months ago

@KathiSchleidt petascope is updated on https://fairicube.rasdaman.com/rasdaman/ows, now it has this element in WCS GetCapabilities response:

<inspire_dls:SpatialDataSetIdentifier>
        <inspire_common:Code>FAIRiCUBE</inspire_common:Code>
    </inspire_dls:SpatialDataSetIdentifier>

I don't see any other things which I can do in petascope for XML validation, if you think that as well then this ticket can be closed.

KathiSchleidt commented 7 months ago

@bangph many thanks - the inspire_dls part of the XML is now valid.

I'll close providing a summary of the status of XML validation shortly

KathiSchleidt commented 7 months ago

Here an overview of the issues with XML on WCS. For details on failed validation, see full protocol

Based on the statements above, there seems to be no outlook on getting these issues resolved, the update to WCS 2.1 just brought new issues. Conclusion: WCS cannot be utilized if valid XML is required.

WCS 2.0

WCS 2.1

pebau commented 7 months ago

reopening, as this requires more elaboration.

KathiSchleidt commented 7 months ago

@pebau from a FAIRiCUBE perspective, this test addressed the validity of the XML provided by the WCS instance powered by rasdaman. This does not require an analysis of the source of the error, just the existence. The four different coverages on which tests were performed were selected to represent the different coverage structures being provided.

As for the errors ensuing from the provision of ISO formatted date strings in a field defined as double, while it is appreciated that work is now underway enabling spatio-temporal cubes (to date only spatial was possible), we will encounter the same underlying issue in the WCS standard when we advance to categorical dimensions, as required for the Occurrence Cubes.

In addition, the lack of a GetCapabilities Response for WCS 2.1 in addition to the mix of 2.1 and 2.0 concepts in the 2.1 schema is concerning.

Thus my conclusion that valid XML encoding of spatiotemporal coverages is not possible under the OGC WCS standards the way they are currently defined.