Closed fmerhout closed 5 years ago
Probably related. Can you let me know what rdr_add_space_around_punctuations("- what is wrong?") gives?
Here is the result of rdr_add_space_around_punctuations("- what is wrong?")
" - what is wrong ? "
Interestingly, the same does not happen with rdr_add_space_around_punctuations("+ what is wrong?")
"+ what is wrong ? "
Yes, that is the problem the rdr_add_space_around_punctuations does not tokenise correctly. Same issues as the other issue just reported. You need to make sure the first letter is not a space, tokenise correctly (every token is separated by a space) and flag add_space_around_punctuations=FALSE
It is because of some punctuations in the text. Using removePunctuation(text)
from tm
package works for me.
Closing as solution was provided. It's up to the user to do tokenisation with this R package. If you need tokenisation, use the udpipe R package.
I came across an error when passing the tagger sentences that have a leading symbol like
-
or?
.Here is an example:
rdr_pos(rdr_model(language = "English", annotation = "UniversalPOS"), "- what is wrong?")
Returns the following error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.StringIndexOutOfBoundsException: String index out of range: 0
It seems like this is an rJava error but I thought I'd post it here first.