Open GoogleCodeExporter opened 9 years ago
Consider you have a document containing only stopwords.
A segmenter creates tokens for these.
A stopword remover removes all tokens.
A parser runs and finds no tokens.
This parser should not fail, it should simply do nothing.
Input capabilities do not mean that a component must fail if no annotations of
the given type are present. It just means that the component may use this
information - at least that is my understanding given what I show-cased above.
We should fail with a proper message if illegal combinations of annotations are
encountered, e.g. if a parser finds a Token that has no POS tag or if it finds
a POS tag that has no value.
We might also want to fail in cases where we know that a model makes use of
some information that is not present in the CAS, e.g. MaltParser fails if it
finds that the model it uses needs lemma information but there is no lemma
information available on a token (this can be turned of too by setting
PARAM_IGNORE_MISSING_FEATURES to true).
So yes, we should handle this better.
But I believe using type capabilities is not going to take use anywhere.
Original comment by richard.eckart
on 6 Aug 2014 at 5:06
>> This parser should not fail, it should simply do nothing.
but it would be nice if the parser would issue a message that it did nothing
and why it did nothing - I recently had a similar issue with the StanfordParser
where I read in an annotated corpus, but the Sentence annotations were missing.
It took me a while to realize that
1 the parser did nothing
2 it did nothing because of missing Sentence annotations
Original comment by eckle.kohler
on 6 Aug 2014 at 5:16
ok Richard is of course right.
The problem is there, only my solution is not the best.
The parsers should at least output a warning if the rely on POS, but there is
no POS annotations present.
Currently it not even silently does nothing, but throws an exception which is
definitely not how this should behave :)
Original comment by torsten....@gmail.com
on 6 Aug 2014 at 5:21
Original issue reported on code.google.com by
torsten....@gmail.com
on 6 Aug 2014 at 4:06