KorAP / KorAP-XML-TEI

Conversion of TEI P5 based formats to KorAP-XML
BSD 2-Clause "Simplified" License
2 stars 0 forks source link

Failed KorAP-Tokenizer start is not detected #6

Closed kupietz closed 1 year ago

kupietz commented 1 year ago

The error occurs e.g. if there is not enough memory to start the Java VM. The result can be something like:


<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="span.rng"
            type="application/xml"
            schematypens="http://relaxng.org/ns/structure/1.0"?>
<layer docid="AAZ22_JAN.00001"
       xmlns="http://ids-mannheim.de/ns/KorAP"
       version="KorAP-0.4">
  <spanList>
    <span id="t_0" from="Error" to="occurred" />
    <span id="t_1" from="during" to="initialization" />
    <span id="t_2" from="of" to="VM" />
  </spanList>
</layer>
``
Akron commented 1 year ago

The other errors occur during processing - so maybe it's better to catch a failure whenever it occurs and not by checking the VM startup? See Gerrit #6652. This would also help with other tokenizers producing wrong results - not specific to KorAP-Tokenizer.