Open michmech opened 1 year ago
This line is surprising and I think the part { přidat k tvrdit }
should not be there; nothing similar occurs anywhere else in the treebank.
However, spaces in MISC are not an error in general, so UDPipe should not die on them @foxik. (I think a leading or trailing whitespace would trigger a validation error, but there can be a space in the middle of a value, for example, if there is Latin transliteration of a FORM or LEMMA that contain a space.)
If I recall correctly, the spaces in MISC were not originally allowed in CoNLL-U v2 (maybe in the proposed version) -- so the implementation in UDPipe 1 did not originally allowed them, only in FORM and LEMMA. The spaces in MISC are allowed since https://github.com/ufal/udpipe/commit/9df115a6e8c0e71c94819f9007a6cebcbb363150, but we have not made a release since then (yes, it is long planned...). Once the release is made, it will work again; or it is possible to compile manually in the meantime.
Note that this affects also UDPipe 2 (which uses UDPipe 1 for tokenization).
When I attempt to train a UDPipe model from this treebank, using UDPipe 1.2.0:
I get the following error message:
Does this mean the treebank is broken? Or is there an option in UDPipe that I could use to get over this?
Thank you, Michal