Closed GregSilverman closed 5 years ago
Can you give me a minimal CAS file that breaks?
This is a full MIMIC (de-identified) CAS and the TypeSystem file for BioMedICUS.
Hi, I also had to add these types:
t = self.create_type(name='org.apache.uima.examples.SourceDocumentInformation', supertypeName='uima.tcas.Annotation')
self.add_feature(t, name='uri', rangeTypeName='uima.cas.String')
self.add_feature(t, name="offsetInSource", rangeTypeName="uima.cas.Integer")
self.add_feature(t, name="documentSize", rangeTypeName="uima.cas.Integer")
self.add_feature(t, name="lastSegment", rangeTypeName="uima.cas.Integer")
t = self.create_type(name='uima.noNamespace.ArtifactID', supertypeName='uima.tcas.Annotation')
self.add_feature(t, name='artifactID', rangeTypeName='uima.cas.Integer')
t = self.create_type(name='uima.noNamespace.ArtifactMetadata', supertypeName='uima.tcas.Annotation')
self.add_feature(t, name='key', rangeTypeName='uima.cas.String')
self.add_feature(t, name='value', rangeTypeName='uima.cas.String')
and now, I am getting the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-458c411012c6> in <module>()
10 # 528715 737-v1
11 with open(dir_test + '528715.txt.xmi', 'rb') as f:
---> 12 cas = load_cas_from_xmi(f, typesystem=typesystem)
/anaconda3/lib/python3.7/site-packages/cassis/xmi.py in load_cas_from_xmi(source, typesystem)
36 return deserializer.deserialize(BytesIO(source.encode("utf-8")), typesystem=typesystem)
37 else:
---> 38 return deserializer.deserialize(source, typesystem=typesystem)
39
40
/anaconda3/lib/python3.7/site-packages/cassis/xmi.py in deserialize(self, source, typesystem)
95 annotation = annotations[member_id]
96
---> 97 view.add_annotation(annotation)
98
99 return cas
/anaconda3/lib/python3.7/site-packages/cassis/cas.py in add_annotation(self, annotation)
167 annotation.xmiID = self._get_next_xmi_id()
168 if isinstance(annotation, AnnotationBase):
--> 169 annotation.sofa = self.get_sofa().xmiID
170
171 self._current_view.add_annotation_to_index(annotation)
AttributeError: 'uima_cas_FSArray' object has no attribute 'sofa'
Please advise.
org.apache.uima.examples.SourceDocumentInformation
The type system file in your archive does not declare this type - consequently cassis cannot know it. Note that this is also not a built-in UIMA type. So in order to have a complete type system definition, you need to add it to your type system definition XML file.
ArtifactID
andArtifactMetadata
These are "special" because they do not have a namespace/package declaration. This actually not a good idea because it means that these classes cannot be used with the UIMA JCas interface. I would recommend that you move these into a proper namespace/package. See e.g. https://stackoverflow.com/a/283828/2511197
'uima_cas_FSArray' object has no attribute 'sofa'
FSArray inherits directly from TOP, not from AnnotationBase - it has no sofa feature. If cassis believes that FSArray inherits from AnnotationBase, it would seem to be a bug.
@reckart Thanks for the reply. I just added the missing type as per your recommendation.
Regarding the types with no namespace, I have no control over the source code for these - we're using 4-different NLP annotators to compare the system annotations for various tasks. We just need to extract specific annotations.
And, regarding the last error, yes, there should be no sofa feature, which is why the error seemed strange. So, it would seem to be a bug.
@GregSilverman Thank you for the report and using dkpro-cassis! I will look into the errors tomorrow and then report back.
@Rentier, it seems very promising for our use case. And while I have background in JVM languages, I have become very lazy and would prefer to stay within the python ecosystem.
Figured out the issue with FSArray and the sofa object: These clinical NLP pipelines define FSArray a bit different (they also have start
and end
features for annotated data). I am probably going to fork this project going forward and put all changes there.
@GregSilverman uima.cas.FSArray
is a feature and inheritance final type in UIMA. It can neither be subclassed nor can additional features be added to it. It seems that something is very odd in that data you have.
@GregSilverman Can you give us a pointer to the source of the data which uses FSArray with begin/end/sofa features?
@GregSilverman In your XMI file, I also don't see FSArrays having begin/end/sofa features:
<cas:FSArray xmi:id="40465" elements="40456"/>
The elements of the array might be annotations and have begin/end/sofa, but not the FSArray itself.
So I'd say, it is more likely a bug in cassis that FSArrays are not properly interpreted if you see FSArray problems with the file you provided.
@reckart, unless I'm interpreting this incorrectly, in the TypeSystem file I sent, there are several annotation types, such as this that have a rangeType of FSArray
:
<typeDescription>
<name>biomedicus.v2.Historical</name>
<description>Automatically generated type from edu.umn.biomedicus.modification.Historical</description>
<supertypeName>uima.tcas.Annotation</supertypeName>
<features>
<featureDescription>
<name>cueTerms</name>
<description>Automatically generated feature</description>
<rangeTypeName>uima.cas.FSArray</rangeTypeName>
<elementType>biomedicus.v2.ModificationCue</elementType>
</featureDescription>
</features>
</typeDescription>
that have start
and end
features in the XMI file. So, I changed the supertype for FSArray
to uima.tcas.Annotation
.
As for sofa, a simple search in the XMI within this tag in the XMI file: <cas:View members="8 13 15" sofa="1"/>
for the id
you have above the FSArray
-> 40465 has this in the list.
Again, I may be interpreting this incorrectly, but since this has a sofa feature and since the id for FSArray
is in the XML tag, I assumed there was some inheritance going on.
Disclaimer: I am fairly new to UIMA, so be kind to me! ;-)
@GregSilverman in your example biomedicus.v2.Historical
has a feature with the range type FSArray
and the elements of that array are biomedicus.v2.ModificationCue
.
In a programming languge, one might write that approximately as:
package biomedicus.v2;
class ModificationCue extends Annotation {
}
class Historical extends Annotation {
FSArray<ModificationCue> cueTerms;
}
So here we have two annotation types: Historical
and ModificationCue
. The FSArray
type itself is a built-in UIMA type which does not inherit from Annotation
- it could be liked to types such as List
or Array
in programming languages. FSArray
inherits from TOP
which is the root of the UIMA type hierarchy - that is roughly comparable to Object
in some programming languages.
Again, I may be interpreting this incorrectly, but since this has a sofa feature and since the id for
FSArray
is in the XML tag, I assumed there was some inheritance going on.
What you see there is a reference from one feature structure to another. The type inheritance hierarchy cannot be determined by looking at the XMI file - you need to look at the type system descriptor XML file for the inheritance.
@GregSilverman I see three bugs in this issue, I try to address them one by one (I edited your first post for keeping track).
@GregSilverman For me, the CAS you posted here loads with the most recent master. Can this issue then be closed?
Yes, definitely. My local fix to the version installed via pip works and I can grab the latest commit of master later. Thanks!
@GregSilverman in your example
biomedicus.v2.Historical
has a feature with the range typeFSArray
and the elements of that array arebiomedicus.v2.ModificationCue
.In a programming languge, one might write that approximately as:
package biomedicus.v2; class ModificationCue extends Annotation { } class Historical extends Annotation { FSArray<ModificationCue> cueTerms; }
So here we have two annotation types:
Historical
andModificationCue
. TheFSArray
type itself is a built-in UIMA type which does not inherit fromAnnotation
- it could be liked to types such asList
orArray
in programming languages.FSArray
inherits fromTOP
which is the root of the UIMA type hierarchy - that is roughly comparable toObject
in some programming languages.
@reckart, yes, I figured that out later after I posted this last night.
Hi there!
Revisiting this, I grabbed the latest commit and am trying to get this working on the enclosed CAS and type system file.
When I run this:
from cassis import *
typesystem_dir = <path to typesystem file>
dir_test = <path to XMI>
with open(typesystem_dir + 'TypeSystem.xml', 'rb') as f:
typesystem = load_typesystem(f)
# add missing types
t = typesystem.create_type(name='org.apache.uima.examples.SourceDocumentInformation', supertypeName='uima.tcas.Annotation')
typesystem.add_feature(t, name='uri', rangeTypeName='uima.cas.String')
typesystem.add_feature(t, name="offsetInSource", rangeTypeName="uima.cas.Integer")
typesystem.add_feature(t, name="documentSize", rangeTypeName="uima.cas.Integer")
typesystem.add_feature(t, name="lastSegment", rangeTypeName="uima.cas.Integer")
t = typesystem.create_type(name="uima.tcas.DocumentAnnotation", supertypeName="uima.tcas.Annotation")
typesystem.add_feature(t, name="language", rangeTypeName="uima.cas.String")
t = typesystem.create_type(name='uima.noNamespace.ArtifactID', supertypeName='uima.tcas.Annotation')
typesystem.add_feature(t, name='artifactID', rangeTypeName='uima.cas.Integer')
t = typesystem.create_type(name='uima.noNamespace.ArtifactMetadata', supertypeName='uima.tcas.Annotation')
typesystem.add_feature(t, name='key', rangeTypeName='uima.cas.String')
typesystem.add_feature(t, name='value', rangeTypeName='uima.cas.String')
fname = '0313-v1.txt.xmi'
with open(dir_test + fname, 'rb') as f:
cas = load_cas_from_xmi(f, typesystem=typesystem)
view = cas.get_view('_InitialView')
print([x for x in view.select_all()])
I get the error AttributeError: 'org_apache_ctakes_typesystem_type_structured_DocumentID' object has no attribute 'sofa'
similar to the one above.
I thought this had been fixed as per this issue?
Thanks!
@GregSilverman For me, I did not get the error you describe in master. I encountered an edge case due to your type system redefining a feature called ontologyConceptArr
, I think that you should not redefine that. I added code to handle this case, it is already in master. With the latest master, your file loads for me.
Thanks for looking at this @jcklie. Interesting. Perhaps the fact that I installed this using the pypi version was the issue? Anyway, I will try it later.
@GregSilverman Yes, it is not on pypi but in cassis master. I am waiting for your go that I can release the next version, as I do not want to release something that is broken for you (again).
Got it... I'll have need again very soon for deserializing some more CAS objects. For this last one just now, I just used the changes I had previously made locally, since I have a self-imposed deadline.
For me this works in master and 0.2.0-rc1. I will close this now. Please open a new issue if this error still persists.
Removing
uima.tcas.DocumentAnnotation
from_types
property in the TypeSystem class intypesystem.py
breaksload_cas_from_xmi
.This code was removed in a previous commit:
DocumentAnnotation
as predefined type #41