Closed jbrry closed 4 years ago
I was able to override this behaviour by changing: https://github.com/EmilStenstrom/conllu/blob/68199a7afcbf66660aec09bab5b0a2b995937dd6/conllu/parser.py#L173
to ID_SINGLE = re.compile(r"[0-9][0-9]*")
This enables a match here: https://github.com/EmilStenstrom/conllu/blob/68199a7afcbf66660aec09bab5b0a2b995937dd6/conllu/parser.py#L201
allowing the value to be returned in the tuple format.
It seems to have solved the problem but let me know if you would advise against it, thanks!
@Jbar-ry Excellent catch, thanks for reporting it! It's definitely a bug, I'll think of a good way to solve it and release a new version soon!
Thank you very much @EmilStenstrom!
@Jbar-ry Thank you! I just released 2.2.1 with fixes this bug! Install it with pip install -U conllu
.
Thanks a lot @EmilStenstrom!
I recently upgraded from conllu 1.3.1 to 2.2 due to the latter version's ability to deal with elided tokens/copy nodes (e.g. token 8.1 below) which was addressed in https://github.com/EmilStenstrom/conllu/issues/27.
I am parsing the
deps
column and have a loop which iterates over thedeps
tuples to put the heads into a heads list and the relations into a relations list. The upgrade now includes the copy nodes which is good but now all0:root
labels are returned as a string and not a tuple which breaks my loop.I'm just wondering is this the desired behaviour? e.g. the output of deps looks like:
Is there any particular reason why
'0:root'
shouldn't be[('root', 0)]
?Thanks!