dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
85 stars 22 forks source link

Parsing Error with WebAnno UIMA XMI format #10

Closed karmalet closed 6 years ago

karmalet commented 6 years ago

Hello, cassis team,

I need to parse a XMI file, which is produced by WebAnno in Python. I'm totally new to UIMA or XMI format, so it was lucky for me to discover Casis. Thank you developers.

Unfortunately, the code snippet provided (https://github.com/dkpro/dkpro-cassis#selecting-annotations) doesn't work well for the attached file below.

When I load the file with 'load_cas_from_xmi()' method, the 'Cas' class initialize itself with 'Cas._sofas' to be dictionary key of 1. However, the right dictionary key for the attached example is 12.

How can I make the 'Cas' class to get the right SOFA key of 12?

Plus, is it the same as 'selecting annotations' for retrieving pos-tags that are tagged on tokens? If Casis is not the best option for parsing the attached file, please recommend other alternatives.

thank you so much. webanno629617483446633113export.zip

jcklie commented 6 years ago

The problem is that I assumed that the sofa of an annotation is referenced by the sofaNum, but it looks like it is the xmi:id of a sofa. Long story short: I fixed it. Thank you for the lovely message, luckily my Chinese was good enough for that.

jcklie commented 6 years ago

Fixed in e928fd63e549555a01d8b37bcbefcdfde9502eda.