Closed PolMine closed 4 years ago
This behaviour is a feature. It is explained in the manual.
See stri_enc_tonative()
is your native encoding is not UTF-8 and you want the outputs strings to be marked as natively encoded (e.g., latin1).
Basically you should work with Unicode wherever possible.
To ensure that my polmineR package is portable, it needs to process textual data with different encodings in Windows and *nix environments, i.e. with ISO-88591-1 and UTF-8 locales. The behavior of stringi I report here has caused my a few headaches and it looks like a bug to me.
The original issue I encountered was that I had a buggy conversion from "latin1" to "ISO-8859-1" on a Windows server. It does not make sense, but it is a scenario that should work. More generally, we have the same effect when converting from ISO-8859-1 to ISO-8859-1.
Working on the reprex might, I realized that conversion to ISO-8859-1 causes problems more generally on Windows. But usually you get a warning - not in this case. Note that I have looked into the issue on macOS too, but it really seems to be a Windows thing.
R 4.0.2 (Windows) stringi package version: 1.4.6