DKPro has yet no reader that can read the tagged plain-text corpora that comes along
with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is there a type
to annotate noun phrases in DKPro?
- Tokens have occasionally two or more possible part of speech tags in case of ambiguity,
how to deal with those. Take only the first one?
- The switchboard corpus in PTB has additionally wrongly tagged words marked, how to
deal with those. Is there a 'no-tag' attribute value for a UIMA-Pos type
Original issue reported on code.google.com by Tobias.Horsmann on 2014-08-01 11:12:50
Original issue reported on code.google.com by
Tobias.Horsmann
on 2014-08-01 11:12:50