Is my service in error, or the validator?

PeterParslow commented 4 years ago

See also issue #180 I can see my Capabilities document at https://ckan.publishing.service.gov.uk/csw?REQUEST=GetCapabilities&SERVICE=CSW&VERSION=2.0.2, using three different browsers (Firefox, IE, Opera)

I can see the inspire_ds:ExtendedCapabilities element when I scroll down.

But the validator fails when it searches for ExtendedCapabilities, see http://inspire.ec.europa.eu/validator/v2/TestRuns/EID6d5ef407-57b0-442b-9688-69058b9fec40.html

When I raised this before (#180), you suggested it was something to do with compression on the server I'm testing. This seemed possible, as some browsers had trouble getting the Capabilities document. I discussed this with them, they presumably did something, and now I can access it freely in the browsers available to me. But the validator still fails at this early step.

danielnavarrogeo commented 4 years ago

Dear @PeterParslow

Thank you for opening this issue. We will analyze it and come back to you afterwards.

Regards

iuriemaxim commented 4 years ago

I looked also at the CSW, being interested on the subject.

Google Chrome does not render correctly the response, interpreting it as a text. Microsoft Edge does not render any response. IE 11 is indicating that no page can be found at that address.

While retrieving the response in Firefox and saving it to local disk, it would save a file named "inspire_ds.xsd", while correct would be to save a file that has a termination ".xml". Of course that the content of the .xsd file is not the one that is expected.

Also it may be a problem that xsi:schemaLocation in the header of the document does not contains the http://inspire.ec.europa.eu/schemas/inspire_ds/1.0/inspire_ds.xsd schema, that is described only at the level of the inspire_ds:ExtendedCapabilities element.

A response that is starting with the following content is passing the test

PeterParslow commented 4 years ago

Thanks. I could argue that my XML is valid. But I'll see if there's a way to configure the GetCapabilities response document as you suggest. I've asked on the INSPIRE Community forum (https://inspire.ec.europa.eu/forum/discussion/view/264495/configuring-pycsw-ckan-plugin-to-pass-inspire-discovery-service-validation) - but I'll ask look elsewhere.

iuriemaxim commented 4 years ago

The presence or the absence of the http://inspire.ec.europa.eu/schemas/inspire_ds/1.0/inspire_ds.xsd schema in the header of the response would not solve the main issue. Probably the validator could be changed in order to cope with this, if it is not doing it already.

Most probably the main issue is that the response is not correctly understood by different browsers as I mentioned before. And this should be fixed first.

This is how Google Chrome is interpreting the result of the https://ckan.publishing.service.gov.uk/csw?REQUEST=GetCapabilities&SERVICE=CSW&VERSION=2.0.2 request.

But the demo version for PYCSW is corectly rendering the response in browsers, including in Google Chrome. Just test http://demo.pycsw.org/cite/csw?service=CSW&version=2.0.2&request=GetCapabilities and the response will look like in the image bellow

So first of all the response pf the https://ckan.publishing.service.gov.uk/csw?REQUEST=GetCapabilities&SERVICE=CSW&VERSION=2.0.2 request should render correctly in browsers as the demo does.

You may try install POSTMAN (https://www.postman.com/) and test both requests.

I see one big discrepancy while comparing with other implementations that are passing the ETF validator, as the document header contains the pseudo-atribute "standalone" with the value set to "no".

xml version="1.0" encoding="UTF-8" standalone="no"

Most probably this should be also fixed, by removing this pseudo-attribute from the very beginning of the response.

I do not think that this pseudo-attribute should be there, as the XML should validate against an XSD file.

I observed in other tests that the validator is assuming that the files should start with xml version="1.0" encoding="UTF-8". Nothing extra, nothing less.

Another difference is that content encoding is gzip. Neither in the demo version of the PYCSW the content encoding is not gzip.

Hope it helps to track the issue.

PeterParslow commented 4 years ago

Thanks. I've raised the gzip issue with the support people of the server. I think that "standalone='true'" is actually correct, given that the document does contain entity references (&)

iuriemaxim commented 4 years ago

According to http://www.xmlplease.com/xml/standalone/ my understanding is that ”standalone” pseudo-attribute should not be present at all (XML schema / DTD). In any case "standalone='true'" would not be correct as the value should be either ”yes” or ”no”.

PeterParslow commented 4 years ago

Well, that's a "rule of thumb" in the opinion of that author (XMLPlease.com).

The description here https://xmlwriter.net/xml_guide/xml_declaration.shtml suggests to use it if the document contains "any external entity references".

The spec is at https://www.w3.org/TR/xml/#sec-rmd, and (in typical fashion) states that "The value "no" indicates that there are or may be such external markup declarations" & "If there are no external markup declarations, the standalone document declaration has no meaning". So "yes" is the effective default - it should not do any damage. That said, the spec goes on to say "If there are external markup declarations but there is no standalone document declaration, the value "no" is assumed" - hence the idea that just leaving it out is best - because the reader has to work out what to do for itself.

However, the spec makes it clear that & doesn't count - so the proper standalone value for this document should be "yes".

I'm trying to ask the pycsw community how to configure the GetCapabilities response document, as I can't see that in their documentation. I haven't found how to ask yet! Hence https://inspire.ec.europa.eu/forum/discussion/view/264495/configuring-pycsw-ckan-plugin-to-pass-inspire-discovery-service-validation, asking the INSPIRE community - I'm sure someone is using it!

I've also passed your gzip point on to the server team - thanks for pointing that out.

iuriemaxim commented 4 years ago

Hi Peter, I read the same documents that were mentioned by you and my understanding is that as XML schema are used and not DTD, the standalone should be neither "no", neither "yes", but it should be missing.

The examples provided in your links, namely here https://xmlwriter.net/xml_guide/entity_declaration.shtml are quite clear. As there are no such entity declarations "<!ENTITY name SYSTEM ....." "<!ENTITY name PUBLIC ......" neither GENERAL entity declarations, neither PARAMETER entity declarations, neither in the document, neither externally, my understanding is that the the pseudo-atribute "standalone" should not be present at all. If such declarations as "<! ...... would exist internally or externally, than the XML would be validated against DTD.

In INSPIRE XML (GML) files are not validated against DTD (https://www.w3schools.com/xml/xml_dtd.asp) but against XML schema (https://www.w3schools.com/xml/schema_intro.asp). That's why my understanding is that the "standalone" pseudo-attribute should be missing.

I have no idea why PYCSW is including it.

See also https://www.javatpoint.com/dtd-vs-xsd to see differences between DTD and XSD.

iuriemaxim commented 4 years ago

I just found that WMS 1.1.1 is implemented trough DTD. Examples of DTD files can be seen at: http://schemas.opengis.net/wms/1.1.1/ It may help to understand the difference between DTDs and XSDs if comparing to WMS 1.3.0 that is implemented trough XSD. http://schemas.opengis.net/wms/1.3.0/

For example a request such as https://inspire.meteoromania.ro/WIGOS/WMS?service=WMS&request=GetCapabilities&version=1.1.1 will return a document starting with:

?xml version="1.0" encoding="UTF-8"? !DOCTYPE WMT_MS_Capabilities SYSTEM "https://inspire.meteoromania.ro/WIGOS/WMS/schemas/wms/1.1.1/WMS_MS_Capabilities.dtd"

At least in GeoServer implementations the "standalone" pseudo-attribute is not present in this case. The examples provided in the document https://portal.opengeospatial.org/files/?artifact_id=1058 are showing that standalone="no" is used, as there are DTD entity declarations (external) in the document (those starting with !DOCTYPE). You may add issues (or ask questions) for PYCSW here: https://github.com/geopython/pycsw/issues

PeterParslow commented 4 years ago

I've been working with XML since the 1990s, so I have some understanding of DTDs - in fact, my earlier schemas were defined that way.

Thanks for the PYCSW link; I haven't had any response by using the e-mail list, so I'll try posting the question there (on behalf of the people who actually run the system I'm trying to interact with & test!)

PeterParslow commented 4 years ago

https://github.com/geopython/pycsw/issues/607

I have now had a response from the pycsw community: that the XML is valid, so why would I want to share it. I'm inclined to agree with them: declaring the namespace locally is certainly valid, and standalone =no shouldn't be harmful.

So perhaps it's "just" the HTTP encoding header that's wrong.

danielnavarrogeo commented 4 years ago

Dear @PeterParslow

We have validated your same capabilities response from the service locally and we have found no problems with the XML.

The problem seems to be related with the gzip compression applied by the service. see #180

We are investigating if this might be solved when the INSPIRE Reference Validator performs the http GetCapabilities resquests.

Regards

INSPIRE-MIF / helpdesk-validator

Is my service in error, or the validator? #211