Closed vh-d closed 6 years ago
Just to double check: what platform are you on?
Good point, this was Windows 2008. I will check Linux in few hours and report back.
In a way that's good -- Windows seems to be the worst. So if Encoding()
covers it there we should be good on the others (who may already be good).
On Linux:
Parsing character value:
> test_input <- enc2utf8("value='Žluťoučký kůň'")
> Encoding(test_input)
[1] "UTF-8"
> test_output <- RcppTOML::parseTOML(test_input, fromFile = FALSE)$value
> cat(test_output)
Žluťoučký kůň
> Encoding(test_output)
[1] "unknown"
> Encoding(test_output) <- "UTF-8"
> cat(test_output)
Žluťoučký kůň
and parsing files:
> test_file <- RcppTOML::parseTOML("~/Downloads/test.txt")
> test_file$value
[1] "Žluťoučký kůň"
> Encoding(test_file$value)
[1] "unknown"
> Encoding(test_file$value) <- "UTF-8"
> test_file$value
[1] "Žluťoučký kůň"
So the problem persists (lost encoding attribute) but the consequences are negligible because "unknown" encoding on modern linux is "UTF-8" anyway.
I suggest
enc2utf8()
for every input whenever fromFile=FALSE
and The first would solve non-file input. The second would solve both file and non-file inputs given that user is assumed to provide UTF-8 files by TOML spec.
Sounds good. Also see #20 which is pretty much the same, no?
Yeah, sorry about that. I will try to come up with a pull request soon as this bug bites my application quite a bit.
That's ok. You are being careful, and you are constructing good examples. One change at a time...
Ok, I just pushed a PR with a change I had following the 0.1.4 release. I'd like to make one more change in there and properly document your last change -- see ChangeLog which is a standard (older) format well supported by Emacs :) Can you drop me a full name please, either here or if you prefer by email? And if you want an email different from the one used by git log
. Thanks! The 0.1.5 release some time next week will be much improved thanks to your help.
Can you drop me a full name please, either here if you prefer by email? And if you want an email different from the one used by
git log
.
My name is Václav Hausenblas, nice to meet you :-) Big fan of tinyverse, btw.
Thanks! The 0.1.5 release some time next week will be much improved thanks to your help.
My pleasure! I am going to play with encoding now...
Ok, you're in the ChangeLog now :) And I got my other issue taken care of -- the (new in 0.5.0) local_time type comes back to us now too (as a string, there is no real type for it and I don't think I want to pull in hms
just for this).
So if/when you something for encoding feel free to branch or fork again and show it :)
Fixed in #30
I run into encoding issues when parsing files as well as R characters.
Parsing from text (R characters):
The encoding attribute is lost. But may be set again
Parsing from files
Example: test.txt
TOML files are assumed to be UTF-8 Unicode texts. However R characters obtained from parsing via
parseTOML()
are labeled as "unknown" encoding.In case of files, the solution may be relatively easy, I think. We can assume that input is UTF-8 and label every string output as "UTF-8".