Closed chfleming closed 3 years ago
Encoding is always complex, especially platform related.
Do you mean the html rendered by RMarkdown having problem display these unicode characters?
Is it in windows?
Which vignettes are you referring to? What's your procedures to render it?
Unicode that is explicitly typed into the vignettes' text comes out fine, but Unicode produced by R
code in the vignettes comes out as <U+####> as in the last table here: https://ctmm-initiative.github.io/ctmm/articles/variogram.html#maximum-likelihood-fitting-the-hard-way , even though it comes out fine in the console.
This happens both in the output of devtools::build_vignettes()
and pkgdown::build_articles()
.
I am compiling the vignettes in Windows. I assume this is an issue with the Windows version of R
.
This looks to be the same problem described here
Maybe you can try the asis
option?
The HTML comes out misformatted when I try that. Output after the Unicode bleeds into the document text.
According to knitr::knit_print
example, I wrote a customized knit_print
S3 function, which will apply to list output in code chunk. This fixed the unicode issue, but I'm not sure if this will have bad effect on other list output. Hopefully it will fix all similar problems.
For function to be applied for all, you can put it to the beginning of RMarkdown with include = FALSE
which is not visible in output. If it interfered with other list output, you can put it in a invisible chunk right before the code chunk that need it then remove it later.
knit_print.list <- function(x, ...) {
res = paste(c("##", "", x), collapse = "\n")
knitr::asis_output(res)
}
summary(FITS)
I tried this in the error
vignette:
knit_print.list <- function(x, ...) {
res = paste(c("##", "", x), collapse = "\n")
knitr::asis_output(res)
}
summary(list(HDOP=UERE,homo=UERE2))
rm(knit_print.list)
and my output still has <U+0394>
I also tried putting the print function in the beginning of the document and I was getting the output spilling into the text again.
It seemed that you cannot remove the function in same code chunk. I put rm
to another code chunk and it worked for me.
I tried to put the function in the beginning, and it also worked ( I didn't use rm
in this case).
Previously, I tried defining the print function in the beginning, but I would get misformatted HTML then, with code output spilling into the text. (I didn't check the rendering of the Unicode as everything was garbled.)
This time I removed the function in the next code block, like this:
knit_print.list <- function(x, ...) {
res = paste(c("##", "", x), collapse = "\n")
knitr::asis_output(res)
}
summary(list(HDOP=UERE,homo=UERE2))
and then the next code block
rm(knit_print.list)
And the formatting is fine, but the Unicode still renders as <U+0394>.
I found with the unicode summary column titles all my model summary functions need to be updated too. It was a hack to convert summary results into a data.frame, which depend on specific column title/row names too much. I'm not sure if there is a better approach to deal with this.
Interesting, I did test in my home pc in windows and the unicode was rendered correctly. By the way I was testing by knitting the RMarkdown, not exactly running build_vignettes
. Maybe there is a difference between them?
On that note, the model list summary used to be a data.frame, but data.frames with Unicode column names do not render correctly in Windows, so I changed that to a matrix by removing the single character column (method
) that is not very important. For whatever reason, Unicode column names have not worked in data.frame objects on Windows for years.
So not only are the column names updated (now more correct), but the object is not a data.frame any longer.
I just tried knitting directly and still got <U+0394>. I think this bug depends on the locale of Windows (which may or may not natively support Unicode?). I am adding support for a Unicode language to see if that helps. Otherwise I may be stuck only being able to compile the vignettes properly on specific computers.
All the source files are encoded in UTF-8, right? I checked the vignette in my home pc and it was in UTF-8.
The vignettes are all UTF-8. The DESCRIPTION declares Encoding: UTF-8
. The vignettes declare %\VignetteEncoding{UTF-8}
and \usepackage[utf8]{inputenc}
.
The summary of model list have dAICc, dRMSPE columns. Previously only dAICc is listed in the app model summary table to compare model fits. Do we need to add the dRMSPE column too?
You don't need that column at the moment.
I just updated the web app to meet the changes in unicode column titles.
Is the summary character string encoding "UTF-8" in your platform? If they are not "UTF-8", maybe you can set it explicitly here.
> res <- summary(model_list)
> x <- dimnames(res)[[2]][[1]]
> x
[1] "ΔAICc"
> Encoding(x)
[1] "UTF-8"
The encoding is UTF-8. I'm going to try to run R-Studio under a different language account.
Didn't work for me. Back in English the characters looked like unparsed Unicode gibberish.
I will try in Linux next.
Can you send the generated html to me? I have a hypothesis that it's using a font family that doesn't support unicode, while the console is using another font.
If that's the case, first we can try to edit the html font to see if it will display correctly, then try to specify font in rendering rmd.
This is the html I generated. I removed the latter part of the file to make the rendering time shorter. What's the difference between your html and this one? Encoding? font family? the source code for the unicode character?
Here is my html produced by Windows 10 in Greek locale. While you have ΔAICc Z[red]2
, I have ÄAICc Z[red]²
. Technically, they are both wrong. Yours missed the superscript 2 and mine missed the Delta.
I keep having superscript 2 missed in my platform, so it could be font/locale related, i.e. my font don't have it, and your font don't have Delta.
I tried to encode the string with UTF-8 explicitly, it did work in the simplest case but not in the summary.
I think summary is a matrix, and printing it need to go through print.data.frame
, and the process on the column name might be tricky. I also tried to print the matrix into a character vector first, but the superscript 2 is still not showing up.
This test could show the problem more clearly, note the superscript 2 become 2 in 2nd slot. If you run same code, I bet delta will not show up correctly in your system. This seemed to be a difficult problem to work around from our side.
> evaluate::evaluate("'\u0394'")
[[1]]
$`src`
[1] "'Δ'"
attr(,"class")
[1] "source"
[[2]]
[1] "[1] \"Δ\"\n"
> evaluate::evaluate("'\u00B2'")
[[1]]
$`src`
[1] "'²'"
attr(,"class")
[1] "source"
[[2]]
[1] "[1] \"2\"\n"
When building everything in Linux (which has native Unicode support) everything looks perfect.
Everything is working great in the Windows Subsystem for Linux.
I've upgraded some functions to output unicode characters as appropriate. It works well on the console and in the help files, but does not render in the vignettes.