Describe the bug
We encountered the scenario that our pipeline has created empty .xmi files (because the import data was not clean and contained empty documents). This is a cleaned example of the .xmi file:
Expected behavior
I would like to see that cassis succesfully creates a CAS (that is empty). This is important to us because it avoids downstream problems.
Error message
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_32727/2857839666.py in <module>
----> 1 cas = load_cas_from_xmi(xmi)
~/.local/lib/python3.8/site-packages/cassis/xmi.py in load_cas_from_xmi(source, typesystem, lenient, trusted)
41 deserializer = CasXmiDeserializer()
42 if isinstance(source, str):
---> 43 return deserializer.deserialize(
44 BytesIO(source.encode("utf-8")), typesystem=typesystem, lenient=lenient, trusted=trusted
45 )
~/.local/lib/python3.8/site-packages/cassis/xmi.py in deserialize(self, source, typesystem, lenient, trusted)
225 # Map from offsets in UIMA UTF-16 based offsets to Unicode codepoints
226 if typesystem.is_instance_of(fs.type, "uima.tcas.Annotation"):
--> 227 fs.begin = sofa._offset_converter.uima_to_cassis(fs.begin)
228 fs.end = sofa._offset_converter.uima_to_cassis(fs.end)
229
~/.local/lib/python3.8/site-packages/cassis/cas.py in uima_to_cassis(self, idx)
66 if idx is None:
67 return None
---> 68 return self._uima_to_cassis[idx]
69
70 def cassis_to_uima(self, idx: Optional[int]) -> Optional[int]:
KeyError: 0
Describe the bug We encountered the scenario that our pipeline has created empty .xmi files (because the import data was not clean and contained empty documents). This is a cleaned example of the .xmi file:
Trying to open this file causes an error.
To Reproduce Steps to reproduce the behavior:
Expected behavior I would like to see that cassis succesfully creates a CAS (that is empty). This is important to us because it avoids downstream problems.
Error message
Please complete the following information: