Closed bheinzerling closed 5 years ago
It is actually an imperfection of the documentation of the format. The "start with 1" rule applies to integer numbers for real words, but not for decimal ids of empty nodes. Thus ID=0.1 is OK, while ID=0 would be an error.
Later in the format documentation, it says: It is possible to insert one or more empty nodes indexed i.1, i.2, etc. immediately after a word with index i (where i = 0 for sentence-initial empty nodes).
I think there should be a short warning about the exception in the beginning of the document. Will look into it.
The conllu parser I'm using complains about invalid IDs when trying to read fi_tdt-ud-train.conllu
It looks like the following 5 word indices do not conform to the conllu format, since word indices should start with 1:
(grep output with line numbers)