claeis / ilivalidator

INTERLIS validator
14 stars 20 forks source link

Basic validation of XML-structure? #334

Closed Pierre-de-la-Verre closed 2 years ago

Pierre-de-la-Verre commented 2 years ago

I have an XTF which consists of two "purely concatenated" XTF-files, like this

<?xml version="1.0" encoding="UTF-8"?>
<TRANSFER xmlns="http://www.interlis.ch/INTERLIS2.3">
    <HEADERSECTION ....

... some hundred lines ..

</TRANSFER>
<?xml version=e"1.0" encoding="UTF-8"?>
<TRANSFER xmlns="http://www.interlis.ch/INTERLIS2.3">
    <HEADERSECTION SE...
... again some hundred lines ..

</TRANSFER>

I played around with some standard XML-tools and some Interlis-tools and got different results between error reports and normal, full handling of display and counting.

ilivalidator checked and reported correct only the first "part", the second definition was ignored an not reported.

Then I removed the three lines "transfer - xml-version - transfer" and used

....... first part of data    
</DATASECTION>
    <HEADERSECTION ...
... second part of data

Now ilivalidator checked everything, reported "already existing" things and reported the number of objects separately for part 1 and part 2.

Due to the different results with different tools I would like to know: What's the best solution for files like this? Check and report forbidden/ strange XML-usage? Ignore and report? Be more tolerant and check everything? Check the files in two part because it is allowed? (ignore this post because it is a crazy single problem??)

beistehen commented 2 years ago

What's the best solution for files like this?

Files with more than one root element are not well-formed according to the XML specification. So if you have

<TRANSFER ...>
</TRANSFER>
<TRANSFER ...>
</TRANSFER>

in one file you should split the file in two separate files.

Concerning ilivalidator you may afterwards add both files to the same command line like this: ilivalidator.jar [options] file1.xtf file2.xtf

(ignore this post because it is a crazy single problem??)

No way! If people are unfamiliar with XML (e. g. because using .itf in the past) they value posts like this 💯 👍

Pierre-de-la-Verre commented 2 years ago

Thanks, @beistehen

ilivalidator.jar [options] file1.xtf file2.xtf is not a way in this special situation because the data inside has to be merged - in a "well-formed" way.

edigonzales commented 2 years ago

If you want to concatenating xtf files, why not import them into a single GPKG database and then export the data into a single file?

beistehen commented 2 years ago

ilivalidator.jar [options] file1.xtf file2.xtf is not a way in this special situation because the data inside has to be merged - in a "well-formed" way.

I see two ways to handle this:

  1. Depending on the data, you might want to use multiple baskets inside a <TRANSFER> element. E.g.

    <TRANSFER ...>
    <HEADERSECTION ...></HEADERSECTION>
    <DATASECTION>
    
    <modelname.topic_a BID="basket1">
      <!-- your data part 1 -->
    </modelname.topic_a>
    
    <modelname.topic_b BID="basket2">
      <!-- your data part 2 -->
    </modelname.topic_b>
    
    </DATASECTION>
    </TRANSFER>
  2. Use a XML tool or even a plain XSLT processor (like Saxon) to prepare the .xtf before feeding it to ilivalidator. But think twice before going down the XSLT rabbit hole ... 😏

Pierre-de-la-Verre commented 2 years ago

Thanks. The solution for merging really seem to be a XSLT file.

But the other question is not only refering to ilivalidator: Shouldn't all interlis-tools use the same requirements for accepting a well-formed and refusing a bad-formed XML-file? Is there anybody who have a look at this?

edigonzales commented 2 years ago

As far as I understand: if the file is not well-formed it cannot be a valid Interlis file. You propably want to open a new issue concerning this specific problem. And even better: sponsor the fix for the issue.