Closed mdelhoneux closed 4 years ago
Thank you! I'll have a look when I have time.
More generally: Are you going through all languages and parsing them with conllu? Seems like something I should do myself to ensure compatibility with the format :)
The languages for which there are enhanced dependencies, yes (as part of the iwpt 2020 shared task: https://universaldependencies.org/iwpt20/). Thanks!
@mdelhoneux Thanks again! I managed to fix this in https://github.com/EmilStenstrom/conllu/commit/5d4df35cd1900e42e093f38e5f5d9cb813ad1993
I haven't had the time to parse through all the iwpt 2020 datasets, so feel free to add more failing tests if you find any!
My script was failing for the treebanks ar_padt
and ta_ttb
. Since installing this latest version, my dataset reader can parse both of these treebanks now. Thanks both for your help!
@Jbar-ry Great to hear! Let me know if there are more issues you encounter!
I found a new case where the regular expression for parsing enhanced representations fails in the Arabic training set, see https://github.com/mdelhoneux/conllu/blob/cfc45fb5e52e4fe714472a2002464db4c6876cec/tests/test_parser.py#L477, I have not yet managed to fix this without breaking other tests.