dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
84 stars 22 forks source link

XMI Serializer creates invalid XMI #159

Closed Daedo closed 3 years ago

Daedo commented 3 years ago

Describe the bug Running Deserialization & Serialization on some XMI documents creates multiple elements with the same xmi id.

To Reproduce Steps to reproduce the behavior:

  1. Have an XMI with a feature structure with the xmi id 1 and the Sofa at xmi id 2 (sofa num at 1).
...
<cas:Sofa mimeType="text" sofaID="_InitialView" sofaNum="1" sofaString="..." xmi:id="2"/>
<type:someFeature xmi:id="1"/>
  1. Deserialize and Serialize:
    typesystem = load_typesystem(ts)
    cas = load_cas_from_xmi(xmi, typesystem=typesystem)
    xmi = cas.to_xmi(pretty_print=True)
  2. Now both the Sofa and the feature structure have xmi id 1. This can create errors when trying to deserialize using UIMA or when trying to resolve references.
    ...
    <cas:Sofa mimeType="text" sofaID="_InitialView" sofaNum="1" sofaString="..." xmi:id="1"/>
    <type:someFeature [...] xmi:id="1"/>

Example Files

Expected behavior Deserialization and Serialization should not affect any id.

Please complete the following information:

Additional context I'm building an application that requires sending serialized CAS between a Python client and UIMA server. I noticed that sometimes my python client was sending invalid XMI even if the client wasn't tasked to do anything but deserialize and serialize.

jcklie commented 3 years ago

Thank you for reporting. 0.2.10.dev0 is a pretty old version, can you try 0.5.1?

Daedo commented 3 years ago

Thank you, I've seen it was fixed in 0.5.1.