Closed alanault closed 6 years ago
The solution to this is do your preprofessing after the annotation
Ah - thats a shame, my data is pre-processed, so adding features after POS tagging isn't possible.
Why don't you do the following on your annotated data set:
x$upos <- ifelse(x$token %in% c("your", "list", "of", "brands"), "NOUN", x$upos)
That's a nice (and simple) approach!
interestingly, the list of generalised tokens (like brand) is actually quite small, so it's not too much of an issue on a large corpus.
The token is also split into two the brand and a "" as well, which is labelled as punctuation. Probably worth clearing these out at the same time.
Thanks for your help and time!
Hi there,
Thanks for the package - it's great!
I'm using the package to annotate upos - however, I'm pre-processing where I'm replacing specific terms with tokens. They're identified with an underscore so we know they're not the word. e.g. I love Nike > i love brand
However, when I run the annotation function, it processes the underscore as a symbol, rather than as a noun. Is there a way to make it ignore the underscores? I've read through the documentation, but couldn't find anything. Many thanks Alan