Closed matgrioni closed 7 years ago
Again, I am using Latin and my model is the UD_Latin POS DICT and RDR file. There was no " PUNCT
rule in the DICT file, but after adding the same problem persists.
It is not a problem. I follow Penn Treebank standard where a two single quotation mark ''
is used instead of a double quotation"
. You can post-process the output for your purpose. You might also want to use the model in UD_Latin-ITTB where we have more training data than UD_Latin (they are just two different Latin datasets), leading to better results.
UD_Latin 81.72%
UD_Latin-ITTB 96.87%
When there is a double quote in the source, the output consists of two single quotes. The line in question is
whose tokens have been space separated ;). In the output the double quote after dubitem is two single quotes, which is a problem for my purposes.