Closed ArneDefauw closed 3 years ago
Just to mention: I am working over at the Apache UIMA Java SDK on a test suite for the select API that we have there (part of that work is in this PR). I think it would also be very helpful for cassis to have such a suite.
Basically, what I do in the test suite is:
coveredBy(x, y)
or following(x,y)
by scanning over all annotations and then filtering them using the predicatescas.select()...
call@ArneDefauw Does that happen in master or the last release? I changed it a bit over the weekend, so I wonder whether that is a fix or the reason for bad things happening now
It happens both in the latest release ( 0.4.0 ) and in 0.3.0
Then I will check later whether it is still an issue in master. Thanks for reporting! See also #144
I checked, and #144 fixes the issue. Thanks!
It would be nice to publish a new release (in PYPI ) with the bug #144 corrected. I'm working on some packages that have dkpro cassis as a dependency and as far as I know, it's not possible in a python package to declare dependency from Github.
It is, you can write that also into your requirements.txt or setup.py e.g.
https://stackoverflow.com/questions/32688688/how-to-write-setup-py-to-include-a-git-repo-as-a-dependency https://adamj.eu/tech/2019/03/11/pip-install-from-a-git-repository/
I will release on this weekend though.
It would be nice to publish a new release (in PYPI ) with the bug #144 corrected. I'm working on some packages that have dkpro cassis as a dependency and as far as I know, it's not possible in a python package to declare dependency from Github.
pip install -e git://github.com/dkpro/dkpro-cassis.git@bugfix/144-overlapping-select-covered#egg=dkpro-cassis worked for me
Ok thanks for the link.
El mar., 24 nov. 2020 11:12, ArneD notifications@github.com escribió:
It would be nice to publish a new release (in PYPI ) with the bug #144 https://github.com/dkpro/dkpro-cassis/issues/144 corrected. I'm working on some packages that have dkpro cassis as a dependency and as far as I know, it's not possible in a python package to declare dependency from Github.
pip install -e git:// github.com/dkpro/dkpro-cassis.git@bugfix/144-overlapping-select-covered#egg=dkpro-cassis worked for me
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dkpro/dkpro-cassis/issues/151#issuecomment-732797075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7JD4VSW4LTGAJYLHKW7CLSROBHXANCNFSM4T7OTBDQ .
Describe the bug
cas.select_covered( ... , .. ) method does not return all covered elements in some situations
To Reproduce
Small example to reproduce the behavior.
Steps to reproduce the behavior:
Use small_typesystem.xml and small_cas.xml, from https://github.com/dkpro/dkpro-cassis/tree/master/tests/test_files
is equal to: [cassis_Sentence(xmiID=14, id='0', begin=0, end=26, type='cassis.Sentence'), cassis_Sentence(xmiID=15, id='1', begin=27, end=47, type='cassis.Sentence')]
and
list( cas.select( 'cassis.Token') ) is:
[cassis_Token(xmiID=3, id='0', pos='NNP', begin=0, end=3, type='cassis.Token'), cassis_Token(xmiID=16, id='11', pos='NNP', begin=0, end=10, type='cassis.Token'), cassis_Token(xmiID=4, id='1', pos='VBD', begin=4, end=10, type='cassis.Token'), cassis_Token(xmiID=5, id='2', pos='IN', begin=11, end=14, type='cassis.Token'), cassis_Token(xmiID=6, id='3', pos='DT', begin=15, end=18, type='cassis.Token'), cassis_Token(xmiID=7, id='4', pos='NN', begin=19, end=24, type='cassis.Token'), cassis_Token(xmiID=8, id='5', pos='.', begin=25, end=26, type='cassis.Token'), cassis_Token(xmiID=9, id='6', pos='DT', begin=27, end=30, type='cassis.Token'), cassis_Token(xmiID=10, id='7', pos='NN', begin=31, end=36, type='cassis.Token'), cassis_Token(xmiID=11, id='8', pos='VBD', begin=37, end=40, type='cassis.Token'), cassis_Token(xmiID=12, id='9', pos='JJ', begin=41, end=45, type='cassis.Token'), cassis_Token(xmiID=13, id='10', pos='.', begin=46, end=47, type='cassis.Token')]
while
list( cas.select_covered('cassis.Token', list( cas.select( 'cassis.Sentence' ))[0] ) ) is:
[cassis_Token(xmiID=16, id='11', pos='NNP', begin=0, end=10, type='cassis.Token'), cassis_Token(xmiID=4, id='1', pos='VBD', begin=4, end=10, type='cassis.Token'), cassis_Token(xmiID=5, id='2', pos='IN', begin=11, end=14, type='cassis.Token'), cassis_Token(xmiID=6, id='3', pos='DT', begin=15, end=18, type='cassis.Token'), cassis_Token(xmiID=7, id='4', pos='NN', begin=19, end=24, type='cassis.Token'), cassis_Token(xmiID=8, id='5', pos='.', begin=25, end=26, type='cassis.Token')]
Expected behavior list( cas.select_covered('cassis.Token', list( cas.select( 'cassis.Sentence' ))[0] ) ) should also contain
Token(begin=0, end=3, id='0', pos='NNP')
This problem only seems to occur if begin index of an overlapping Token (Tokens with id=11 and id=0) coincides with the begin index of a Sentence.
I was annotating multi-words, where such situation (overlapping (multi-)Tokens) is not uncommon.