Closed zesch closed 9 months ago
I removed two instances where a CAS was redundantly created and then not used in tests that used these fixtures.
But there seem to be more instances where these fixtures are used, e.g.:
def test_select(small_typesystem_xml, tokens, sentences):
ts = load_typesystem(small_typesystem_xml)
cas = Cas(typesystem=ts)
cas.add_all(tokens + sentences)
That is a bit of a quirky case because it means that the tokens/sentences that were created on some CAS originally are now transplanted to a new one (in this case an empty one that does not even contain any text) and that makes them a bit of zombie annotations. For testing, it seems ok though and there can be cases where such a transplantation can make sense.
Any suggestion what to do here - if anything?
How about not creating the Cas in the test, but as part of the fixture? That way there is only one place where the annotations are bound to the Cas, so less room for using this wrong. Along those lines:
@pytest.fixture
def cas_simple(typesystem_xml, tokens, sentences):
ts = load_typesystem(typesystem_xml)
cas = Cas(ts)
cas.sofa_string = CAS_TEXT
cas.add_all(tokens)
cas.add_all(sentences)
return cas
If I understand your example correctly, then this is a new cas_simple
fixture which gets injected with some pre-defined tokens
and sentences
which are then added to the CAS. But since the tokens and sentences do not know the CAS text, how could they have sensible offsets?
I think this would only work if the tokens/sentences would not be injected but rather generated using tokenizer/sentence splitter on the CAS text.
I wanted to stay as closely to the existing fixture structure, but you are right, this does not solve the underlying issue.
There seems to be some confusion about the relationship between annotations and Cas in the fixtures, e.g. here https://github.com/dkpro/dkpro-cassis/blob/49a66ee108b66a8f139ec4a5d2207b6884a96a60/tests/fixtures.py#L441-L465 AFAICT, adding the annotations to the Cas does not do anything, as only the tokens are returned later. (the same for sentences fixture)
Adding annotations to the CAS sets the sofa
property inside the annotation. That is used in particular by annotation.get_covered_text()
to retrieve the annotated text. If an annotation has not been added to a CAS, that method is defunct.
ok I see. That makes sense. Thanks for correcting my understanding :)
The tokens() and sentences() fixtures internally create a Cas but it is never returned. Thus the sofa_string is not set. Some test that rely on get_covered_text then individually set the sofa_string, but this could introduce inconsistencies.
Suggested improvement: create a fixture that directly returns a Cas and consistently sets the sofa_string to the actual tokens.