dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
85 stars 22 forks source link

issue #39; support ann name with digit; add type #46

Closed GregSilverman closed 5 years ago

GregSilverman commented 5 years ago

Still need to add a test (I will do this as time permits), but this worked in all cases.

Also, found bug when digit in annotation type name: added regex for digits to _string_to_valid_classname.

Also, added missing types to type system definitions:

"org.apache.uima.examples.SourceDocumentInformation"

codecov-io commented 5 years ago

Codecov Report

Merging #46 into master will decrease coverage by 1.66%. The diff coverage is 56.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
- Coverage   97.19%   95.52%   -1.67%     
==========================================
  Files           9        9              
  Lines         927      961      +34     
==========================================
+ Hits          901      918      +17     
- Misses         26       43      +17
Impacted Files Coverage Δ
cassis/typesystem.py 96.21% <100%> (+0.07%) :arrow_up:
cassis/xmi.py 89.71% <48.48%> (-9.61%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 93ed4ec...d8e2458. Read the comment docs.

jcklie commented 5 years ago

Thank you very much for the pull request! I will look into it next week, I still am on the road. Sorry again for the delay.

jcklie commented 5 years ago

Thank you again for your pull request and the interest in cassis! I used your code as an inspiration for that feature. I created a pull request on my own #50 . Some questions: why do you want to not strip numbers from type names? SourceDocumentInformation is not exactly a builtin type, it comes from org.apache.uima.examples, therefore I do not want to add it as a builtin type.

GregSilverman commented 5 years ago

We have types with numbers in the name (e.g, biomedicus.v2.UmlsConcept). I found that when the number was being stripped then I could not query for this type, which is why I added that to the regex.

That's fine about the other type. We'll just maintain our own version of the typesystem file.

reckart commented 5 years ago

Some questions: why do you want to not strip numbers from type names?

Why should anything be stripped from type names?

jcklie commented 5 years ago

I convert type names to python class names, so I need to e.g. remove the dots.

reckart commented 5 years ago

Doesn't python have a concept of namespaces/packages?