dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
84 stars 22 forks source link

Specific type of array elements in element FS is not retained #285

Open reckart opened 1 year ago

reckart commented 1 year ago

Describe the bug When cassis is parsing an array like this:

{
    "%ID" : 11,
    "%TYPE" : "webanno.custom.LinkType",
    "role" : "p1",
    "@target" : 12
  }, {
    "%ID" : 13,
    "%TYPE" : "webanno.custom.LinkType[]",
    "%ELEMENTS" : [ 11, 9 ]
  }

it is serialized again as:

{
    "%ID" : 11,
    "%TYPE" : "webanno.custom.LinkType",
    "role" : "p1",
    "@target" : 12
  }, {
    "%ID" : 13,
    "%TYPE" : "uima.cas.FSArray",
    "%ELEMENTS" : [ 11, 9 ]
  }

This is technically not wrong and the information about the element type in the array should usually be accessible through the definition of the feature which references the array - however, we can currently not serialize this.

To Reproduce See test tsv3-testSimpleSlotFeature.

Expected behavior Optimally, it should be serialized in the same was as it was loaded. Java UIMA v3 reifies array types in the type system - so instead of only an FSArray, it dynamically creates types like FSArray<LinkType> and binds the FSArray to that - so we have access to the array element type without having to look at the feature it is referenced from. This is particularly useful for arrays which are shared and can be referenced from multiple features (with potentially different yet compatible element type definitions).

However, it seems this won't be trivial to fix in the Python implementation and would require also introducing dynamic array types.

Please complete the following information: