ClearTK / cleartk

Machine learning components for Apache UIMA
http://cleartk.github.io/cleartk/
Other
129 stars 58 forks source link

Tokenization in the Berkeley parser wrapper may be not compatible with the PTB #420

Open mjlaali opened 8 years ago

mjlaali commented 8 years ago

The Berkeley parser wrapper has the following limitations: 1- The Berkeley parser wrapper needs the text be tokenized and pos tagged before parsing. 2- The parser does not parse some sentences properly. Specially sentences with tokens needed to be normalized to the Penn Treebank convention (e.g. '(' should be converted to '-LRB-' before the parsing step).