ctmm-initiative / ctmm

Continuous-Time Movement Modeling. Functions for identifying, fitting, and applying continuous-space, continuous-time stochastic movement models to animal tracking data.
http://biology.umd.edu/movement.html
43 stars 10 forks source link

Unicode not rendering in vignettes #26

Closed chfleming closed 3 years ago

chfleming commented 5 years ago

I've upgraded some functions to output unicode characters as appropriate. It works well on the console and in the help files, but does not render in the vignettes.

xhdong-umd commented 5 years ago

Encoding is always complex, especially platform related.

Do you mean the html rendered by RMarkdown having problem display these unicode characters?

Is it in windows?

Which vignettes are you referring to? What's your procedures to render it?

chfleming commented 5 years ago

Unicode that is explicitly typed into the vignettes' text comes out fine, but Unicode produced by R code in the vignettes comes out as <U+####> as in the last table here: https://ctmm-initiative.github.io/ctmm/articles/variogram.html#maximum-likelihood-fitting-the-hard-way , even though it comes out fine in the console.

This happens both in the output of devtools::build_vignettes() and pkgdown::build_articles().

I am compiling the vignettes in Windows. I assume this is an issue with the Windows version of R.

xhdong-umd commented 5 years ago

This looks to be the same problem described here

Maybe you can try the asis option?

chfleming commented 5 years ago

The HTML comes out misformatted when I try that. Output after the Unicode bleeds into the document text.

xhdong-umd commented 5 years ago

According to knitr::knit_print example, I wrote a customized knit_print S3 function, which will apply to list output in code chunk. This fixed the unicode issue, but I'm not sure if this will have bad effect on other list output. Hopefully it will fix all similar problems.

For function to be applied for all, you can put it to the beginning of RMarkdown with include = FALSE which is not visible in output. If it interfered with other list output, you can put it in a invisible chunk right before the code chunk that need it then remove it later.

knit_print.list <- function(x, ...) {
  res = paste(c("##", "", x), collapse = "\n")
  knitr::asis_output(res)
}
summary(FITS)

2018-09-21_135733

chfleming commented 5 years ago

I tried this in the error vignette:

knit_print.list <- function(x, ...) {
  res = paste(c("##", "", x), collapse = "\n")
  knitr::asis_output(res)
}
summary(list(HDOP=UERE,homo=UERE2))
rm(knit_print.list)

and my output still has <U+0394>

I also tried putting the print function in the beginning of the document and I was getting the output spilling into the text again.

xhdong-umd commented 5 years ago

It seemed that you cannot remove the function in same code chunk. I put rm to another code chunk and it worked for me. I tried to put the function in the beginning, and it also worked ( I didn't use rm in this case).

chfleming commented 5 years ago

Previously, I tried defining the print function in the beginning, but I would get misformatted HTML then, with code output spilling into the text. (I didn't check the rendering of the Unicode as everything was garbled.)

This time I removed the function in the next code block, like this:

knit_print.list <- function(x, ...) {
  res = paste(c("##", "", x), collapse = "\n")
  knitr::asis_output(res)
}
summary(list(HDOP=UERE,homo=UERE2))

and then the next code block

rm(knit_print.list)

And the formatting is fine, but the Unicode still renders as <U+0394>.

xhdong-umd commented 5 years ago

I found with the unicode summary column titles all my model summary functions need to be updated too. It was a hack to convert summary results into a data.frame, which depend on specific column title/row names too much. I'm not sure if there is a better approach to deal with this.

xhdong-umd commented 5 years ago

Interesting, I did test in my home pc in windows and the unicode was rendered correctly. By the way I was testing by knitting the RMarkdown, not exactly running build_vignettes. Maybe there is a difference between them?

chfleming commented 5 years ago

On that note, the model list summary used to be a data.frame, but data.frames with Unicode column names do not render correctly in Windows, so I changed that to a matrix by removing the single character column (method) that is not very important. For whatever reason, Unicode column names have not worked in data.frame objects on Windows for years.

So not only are the column names updated (now more correct), but the object is not a data.frame any longer.

chfleming commented 5 years ago

I just tried knitting directly and still got <U+0394>. I think this bug depends on the locale of Windows (which may or may not natively support Unicode?). I am adding support for a Unicode language to see if that helps. Otherwise I may be stuck only being able to compile the vignettes properly on specific computers.

xhdong-umd commented 5 years ago

All the source files are encoded in UTF-8, right? I checked the vignette in my home pc and it was in UTF-8.

chfleming commented 5 years ago

The vignettes are all UTF-8. The DESCRIPTION declares Encoding: UTF-8. The vignettes declare %\VignetteEncoding{UTF-8} and \usepackage[utf8]{inputenc}.

xhdong-umd commented 5 years ago

The summary of model list have dAICc, dRMSPE columns. Previously only dAICc is listed in the app model summary table to compare model fits. Do we need to add the dRMSPE column too?

chfleming commented 5 years ago

You don't need that column at the moment.

xhdong-umd commented 5 years ago

I just updated the web app to meet the changes in unicode column titles.

xhdong-umd commented 5 years ago

Is the summary character string encoding "UTF-8" in your platform? If they are not "UTF-8", maybe you can set it explicitly here.

> res <- summary(model_list)
> x <- dimnames(res)[[2]][[1]]
> x
[1] "ΔAICc"
> Encoding(x)
[1] "UTF-8"
chfleming commented 5 years ago

The encoding is UTF-8. I'm going to try to run R-Studio under a different language account.

chfleming commented 5 years ago

Didn't work for me. Back in English the characters looked like unparsed Unicode gibberish.

I will try in Linux next.

xhdong-umd commented 5 years ago

Can you send the generated html to me? I have a hypothesis that it's using a font family that doesn't support unicode, while the console is using another font.

If that's the case, first we can try to edit the html font to see if it will display correctly, then try to specify font in rendering rmd.

xhdong-umd commented 5 years ago

This is the html I generated. I removed the latter part of the file to make the rendering time shorter. What's the difference between your html and this one? Encoding? font family? the source code for the unicode character?

chfleming commented 5 years ago

Here is my html produced by Windows 10 in Greek locale. While you have ΔAICc Z[red]2, I have ÄAICc Z[red]². Technically, they are both wrong. Yours missed the superscript 2 and mine missed the Delta.

xhdong-umd commented 5 years ago

I keep having superscript 2 missed in my platform, so it could be font/locale related, i.e. my font don't have it, and your font don't have Delta.

I tried to encode the string with UTF-8 explicitly, it did work in the simplest case but not in the summary.

2018-10-11_140634

I think summary is a matrix, and printing it need to go through print.data.frame, and the process on the column name might be tricky. I also tried to print the matrix into a character vector first, but the superscript 2 is still not showing up.

This test could show the problem more clearly, note the superscript 2 become 2 in 2nd slot. If you run same code, I bet delta will not show up correctly in your system. This seemed to be a difficult problem to work around from our side.

> evaluate::evaluate("'\u0394'")
[[1]]
$`src`
[1] "'Δ'"

attr(,"class")
[1] "source"

[[2]]
[1] "[1] \"Δ\"\n"

> evaluate::evaluate("'\u00B2'")
[[1]]
$`src`
[1] "'²'"

attr(,"class")
[1] "source"

[[2]]
[1] "[1] \"2\"\n"
chfleming commented 5 years ago

When building everything in Linux (which has native Unicode support) everything looks perfect.

chfleming commented 3 years ago

Everything is working great in the Windows Subsystem for Linux.