Closed KevinGlock closed 3 years ago
I think this is based on a misunderstanding of the argument encoding
. You can use it to state the encoding of the corpus (latin-1 in the case of GermaParl). Stating an encoding that is different from what the corpus "really" has necessarily leads to broken output.
Hi,
I created a partition from GermaParl
when I used kwic()
R returns an warning message:
... getting corpus positions ... no matches for query (or no matches left after applying stoplist/positivelist) NULL Warning message: In .local(.Object, ...) : No hits for query ".*[Aa]us.*bürger.*" (returning NULL)
Instead of using UTF-8 I used the latin1 encoding and the result shows 73 hits
... getting corpus positions ... number of hits: 73 ... checking that all p-attributes are available ... getting token id for p-attribute: word ... generating contexts
.This is a problem when using further workflows for highlighting text as well as for reading it because of the encoding.