dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
85 stars 22 forks source link

ValueError when reading a type system file that redefines a feature #77

Closed dmitriydligach closed 5 years ago

dmitriydligach commented 5 years ago

Describe the bug

First of all, thank you very much for creating this useful tool.

Unfortunately, I stumble when I try to load the cTAKES (https://ctakes.apache.org) type system:

https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSystem.xml

I get the following error:

ValueError: Feature with name [value] already exists in [org.apache.ctakes.typesystem.type.refsem.LabReferenceRange]!

To Reproduce

Save https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/ somewhere on your system and then do:

f = open('TypeSystem.xml', 'rb') load_typesystem(f)

Expected behavior

I'm not totally sure why this error happens -- it's seems like this file (TypeSystem.xml) is a legitimate type system that's been used by cTAKES community for many years.

Please complete the following information:

I'm doing this on a Mac

Thank you very much in advance for looking into this.

reckart commented 5 years ago

@dmitriydligach the cTAKES type system declares the following type twice:

    <typeDescription>
      <name>org.apache.ctakes.typesystem.type.refsem.LabReferenceRange</name>
      <description>Holds a narrative (i.e. string) reference range</description>
      <supertypeName>org.apache.ctakes.typesystem.type.refsem.Attribute</supertypeName>
      <features>
        <featureDescription>
          <name>value</name>
          <description/>
          <rangeTypeName>uima.cas.String</rangeTypeName>
        </featureDescription>
      </features>
    </typeDescription>

IMHO this is something cTAKES should have a look at and clean up.

@jcklie UIMA "merges" type systems - so if there are multiple declarations of a type that do not conflict with each other, it just overlays them - including redundantly defined features. Thus, the Java UIMA doesn't choke on this one.

reckart commented 5 years ago

@dmitriydligach try removing the duplicate declaration from the cTAKES type system and then please try cassis again.

reckart commented 5 years ago

@dmitriydligach actually, looking at the link you provided to the type system, it seems that quite a few types are redundantly declared - so you'd have to remove all the duplicate declarations for the time being before using cassis.

jcklie commented 5 years ago

I already merge it for redefining types, the problem here is redefining features.

jcklie commented 5 years ago

It should be fixed in master, I will try to release a new version this or next week. You can just use the master via pip using python -m pip install git+https://github.com/dkpro/dkpro-cassis

dmitriydligach commented 5 years ago

@reckart @jcklie Thank you very much for your help! I removed all the duplicate types from the type system file (you're right -- there were quite a few). I am now able to read the type system using load_typesystem(...). Looking forward to exploring this software further. Thanks again.