dkpro / dkpro-pycas

Library for working with UIMA CAS XMI files in Python. This library is deprecated. Use DKPro Cassis instead!
https://github.com/dkpro/dkpro-cassis
Apache License 2.0
3 stars 3 forks source link

Using this for an NLP project #8

Closed obradovicma closed 6 years ago

obradovicma commented 6 years ago

Hi, to which extend is your code already usable for an NLP project. We are currently stuck with UIMA JSON since we were thinking that there was a UIMA JSON reader for reimporting automatically pre-annotated documents to webanno. Now, we are thinking of switching to XMI. Do you see a disadvantage in switching from JSON to XMI? Thanks in advance!

reckart commented 6 years ago

XMI CAS is the most common serialization of the UIMA CAS data structure. Together with a type system specification, it is capable of (de)serializing all data within the UIMA CAS data structure. There are usually only two reasons not to use it:

1) if for efficiency reasons or technical details a binary format is better suited 2) if the data to be annotated contains characters that cannot be encoded in XML files

In all other cases, XMI CAS is probably the best choice.

Does that answer your question?

obradovicma commented 6 years ago

Thank you very much! I agree with you and I'm thinking of really switching to UIMA XMI. To which extent is your code already usable for a development project as ourse?

reckart commented 6 years ago

@obradovicma We currently use PyCAS in to connect a Python-based neural-network classifier with the INCEpTION annotation tool. INCEpTION is a web-based annotation tool written in Java which can show annotation suggestions to the annotator - here we use the Java XMI CAS implementation provided as by the Apache UIMA project. The Python-based neural-network classifier is producing the annotation suggestions. The Python service receives CAS XMI data from INCEpTION, adds new annotations, and then sends the CAS XMI data back. For us, that's working ok. You'd have to try for yourself to see if it works for you too.

@Rentier @mromanello anything to add here?

obradovicma commented 6 years ago

Sounds very cool. Actually, we are doing something very similar in our NLP project. We will definitely try it out using PyCAS. INCEpTION could be very interesting for our project too...we will check it out! Thanks for your help!

reckart commented 6 years ago

@obradovicma Great :) If you find any bugs in PyCAS, please tell us. And if you make any fixes or improvements to the code, we would be very happy if you would contribute them to the project.