google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Some parsers fail when no POS annotations are present #443

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I haven't checked all, but at least MstParser and ClearNlpDependencyParsers 
fail with hard to interpret deep exceptions when no POS annotations are present.

POS annotations are annotated as required type, but as we never really check 
that so far, no warning is issued.

The problem is especially problematic as StanfordParser also works without POS 
annotations being present and simply replacing one parser for another in a 
setup that worked before is confusing.

Could all annotator with type capabilities actually check whether these levels 
are present?
Or is this to expensive to have it automatically enable e.g. in the implBase?

Original issue reported on code.google.com by torsten....@gmail.com on 6 Aug 2014 at 4:06

GoogleCodeExporter commented 9 years ago
Consider you have a document containing only stopwords. 
A segmenter creates tokens for these.
A stopword remover removes all tokens.
A parser runs and finds no tokens.

This parser should not fail, it should simply do nothing.

Input capabilities do not mean that a component must fail if no annotations of 
the given type are present. It just means that the component may use this 
information - at least that is my understanding given what I show-cased above.

We should fail with a proper message if illegal combinations of annotations are 
encountered, e.g. if a parser finds a Token that has no POS tag or if it finds 
a POS tag that has no value.

We might also want to fail in cases where we know that a model makes use of 
some information that is not present in the CAS, e.g. MaltParser fails if it 
finds that the model it uses needs lemma information but there is no lemma 
information available on a token (this can be turned of too by setting 
PARAM_IGNORE_MISSING_FEATURES to true).

So yes, we should handle this better.

But I believe using type capabilities is not going to take use anywhere.

Original comment by richard.eckart on 6 Aug 2014 at 5:06

GoogleCodeExporter commented 9 years ago
>> This parser should not fail, it should simply do nothing.

but it would be nice if the parser would issue a message that it did nothing 
and why it did nothing - I recently had a similar issue with the StanfordParser 
where I read in an annotated corpus, but the Sentence annotations were missing.
It took me a while to realize that
1 the parser did nothing
2 it did nothing because of missing Sentence annotations

Original comment by eckle.kohler on 6 Aug 2014 at 5:16

GoogleCodeExporter commented 9 years ago
ok Richard is of course right.
The problem is there, only my solution is not the best.

The parsers should at least output a warning if the rely on POS, but there is 
no POS annotations present.
Currently it not even silently does nothing, but throws an exception which is 
definitely not how this should behave :)

Original comment by torsten....@gmail.com on 6 Aug 2014 at 5:21