LanguageMachines / foliautils

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
https://proycon.github.io/folia
GNU General Public License v3.0
4 stars 3 forks source link

FoLiA-2text aborts on a metadata issue #37

Closed martinreynaert closed 4 years ago

martinreynaert commented 4 years ago

This happened on the Staten-Generaal Digitaal FoLiA which has been been processed by just about every other FoLiA tool before.

[1]+ Aborted nohup /exp/sloot/usr/local/bin/FoLiA-2text -t 120 --class=Ticcl -e ticcl.xml -o /reddata/POLMASH/FOLIALangCatTICCLTXT/d/nl/proc/sgd/ /reddata/POLMASH/FOLIALangCatTICCL/d/nl/proc/sgd/ > /reddata/POLMASH/FOLIALangCatTICCLTXT.sgd.20191112.stdout 2> /reddata/POLMASH/FOLIALangCatTICCLTXT.sgd.20191112.stderr

reynaert@maize:/reddata/POLMASH$ cat /reddata/POLMASH/FOLIALangCatTICCLTXT.sgd.20191112.stderr nohup: ignoring input WARNING: foreign-data found in metadata of type 'native' changing type to 'foreign' WARNING: foreign-data found in metadata of type 'native' changing type to 'foreign' WARNING: foreign-data found in metadata of type 'native' changing type to 'foreign' terminate called recursively terminate called after throwing an instance of 'folia::NoSuchText' reynaert@maize:/reddata/POLMASH$

I have no idea.

martinreynaert commented 4 years ago

Addendum:

Before it crashed it created 5 output files, one of which had actual output:

reynaert@maize:~$ ls -l /reddata/POLMASH/FOLIALangCatTICCLTXT/d/nl/proc/sgd/ total 2 -rw-r--r-- 1 reynaert reynaert 0 Nov 12 16:41 nl.proc.sgd.d.186918700000115.folia.lc.ticcl.xml.txt -rw-r--r-- 1 reynaert reynaert 0 Nov 12 16:41 nl.proc.sgd.d.189618970000236.folia.lc.ticcl.xml.txt -rw-r--r-- 1 reynaert reynaert 0 Nov 12 16:41 nl.proc.sgd.d.190419050000115.folia.lc.ticcl.xml.txt -rw-r--r-- 1 reynaert reynaert 0 Nov 12 16:41 nl.proc.sgd.d.195419550000051.2.folia.lc.ticcl.xml.txt -rw-r--r-- 1 reynaert reynaert 1398 Nov 12 16:41 nl.proc.sgd.d.198819890000898.8.folia.lc.ticcl.xml.txt

Before this run, I tested it on a small number of files and it ran as expected.

kosloot commented 4 years ago

Well, this is caused by running FoLiA-2text on files without any text in it. This was not foreseen :{ I commited a patch to skip such documents.