Closed ChristophLeonhardt closed 3 years ago
We had realized that this issue occurs when the token "NA" happens to be in the token stream. The solution is to set argument na.string
of fread()
to NULL
when reading in annotated corpus data. As you may forget this easily, I implemented a corenlp_parse_conll()
function (javamultithreading branch).
NA - both in capital letters - is passed as an empty string and not encoded correctly later on. Might be a cwbtools issue as well.