etf-validator / etf-webapp

:earth_africa: :mag: ETF is an open source testing framework for spatial data and services
https://www.etf-validator.net
European Union Public License 1.2
19 stars 19 forks source link

SRS dimension not detected in gml:posList #214

Open carlospzurita opened 4 years ago

carlospzurita commented 4 years ago

Description

During the execution of the TestSuites of Interoperable data sets in GML (Guidelines for the Encoding of Spatial Data version 3.3), we found that certain GML file were causing the INSPIRE validator to crash. After some debugging, we found that this exceptions appearead all along the GML validation. Log attached etf-issue.zip

Whenever this happened, the JVM started to accumulate memory and the TestRun thread would consume all the resources available in the machine until the application crashed.

After investigating the problem, we found this issues related to this problem

https://github.com/inspire-eu-validation/ets-repository/issues/161 https://github.com/etf-validator/etf-gmlgeox/issues/19

We added the srsDimension and srsName to all the gml:posList attributes and the validation went fine. We are adding on the problematic files and the modified file for comparison. US_govser_testDataINSPIREvalidator.gml.zip CORRECTED-US_govser_testDataINSPIREvalidator.gml.zip

Could the memory overload caused by the deegree exception? And if so, what would be the best approach to fix this? Add an early detection of a missing srsDimension attribute?

Operating systems and browser

This is happening in a Docker container with Debian 8, inside an Amazon Linux EC2 instance.

Steps to Reproduce

  1. Select the ETS for Interoperable data sets in GML (Guidelines for the Encoding of Spatial Data version 3.3
  2. Upload the GML test file
  3. Watch the stats for memory and the log file for the exception

Expected behavior:

Test report correctly generated after parsing geometry.

Actual behavior: TestRun thread gets stuck and doesn't free any resource

jonherrmann commented 4 years ago

This problem is fixed in GmlGeoX version 2, which is under development. You can use this version as soon as it is released. Since this is a new major release, the interface has changed and you need to update the ETSs that used it.

cportele commented 4 years ago

Also note that the data is simply incorrect. It contains geometries that declare http://www.opengis.net/def/crs/EPSG/0/4258 as the CRS, but then have elevation values (and state srsDimension="3" which is inconsistent with the CRS). No software will be able to do anything useful with these geometries.

Add an early detection of a missing srsDimension attribute?

Since srsDimension attributes are optional and hints only, this would not be a fix. Tools are free to ignore the srsDimension values.

carlospzurita commented 4 years ago

Of course, you are right that the dataset is incorrect. However, as far as we have found, deegree doesn't care about the coherence between the srsName and the srsDimension. The only check that they perform is the module between the number of coordinates and the srsDimension, and so the "fixed" dataset that we attached doesn't provoke the exception. Even more, we loaded this dataset in QGIS as a vector layer, and the geometries were processed correctly.

The validation should fail, but what was strange was that this issue escalated directly form deegree and provoked a memory leak. We are working on a hotfix to, at least, prevent the appereance of the exception, at least until you are able to release the new version of etf-gmlgeox

A question related to the release of the module is, what is the extent of the changes and how they will affect the ETSs that rely on GML validation?

cportele commented 4 years ago

A question related to the release of the module is, what is the extent of the changes and how they will affect the ETSs that rely on GML validation?

We will update https://github.com/etf-validator/governance/issues/57 to include that information for review in the coming weeks.

carlospzurita commented 4 years ago

We have included a workaround for this issue on the class GeometryElementHandler . Inside the method validate(), we included a modification on the geometry node to recover the srsDimension declared and add it to the gml:posList tag.

String dimension = element.attributeValue("srsDimension");
String originalElement = element.asXML();
originalElement.replace("<gml:posList>", "<gml:posList srsDimension=\""+dimension+"\">");
SAXReader modifiedReader = new SAXReader();
org.dom4j.Document modifiedDocument = modifiedReader.read(originalElement);
Element modifiedElement = modifiedDocument.getRootElement();

All validations are executed on the modifiedElement. This workaround prevents the apparition of the exception. The problematic GML files still are incoherent between srsName and srsDimension, and we need to solve the memory leak in order to validate correctly the geometries, but at least we can let users execute their validations without crashing the whole service.

cportele commented 4 years ago

If I understand the snippet correctly, this may work for the specific file, but srsDimension is an optional attribute with an informative hint and many files will not include it. In that case

String dimension = element.attributeValue("srsDimension");
originalElement.replace("<gml:posList>", "<gml:posList srsDimension=\""+dimension+"\">");

would lead to invalid XML as dimension is null.

carlospzurita commented 4 years ago

We made a modification to check the existence of the dimension before adding the attribute. We also included this on SecondaryGeometryElementHandler

Element modifiedElement = element;
String dimension = element.attributeValue("srsDimension");
if (!(dimension == null)) {
    String originalElement = element.asXML();
    String modifiedXML =originalElement.replace("<gml:posList>", "<gml:posList srsDimension=\""+dimension+"\">");
    SAXReader modifiedReader = new SAXReader();
    org.dom4j.Document modifiedDocument = modifiedReader.read(new InputSource( new StringReader( modifiedXML ) ));
    modifiedElement = modifiedDocument.getRootElement();
}