Closed andrewheiss closed 10 years ago
Try adding the env
variable to the .sublime-build
. @randy3k suggested it to me, along with other possible solutions, here regarding my own, quite related, encoding issues involving knitr and Sublime builds, and it has worked like a charm. My own .sublime-build
variant now looks like this:
"variants":
[
{
"name": "Run",
"working_dir": "$file_path",
"env": { "LANG": "en_US.UTF-8" },
"shell_cmd": "Rscript -e \"rmarkdown::render(input = '$file')\""
}
]
With this, I am able to successfully rmarkdown::render()
your example, although I do get a few warnings in the rendered document:
Trying a simple knit()
after having added the same env
variable to SublimeKnitr's default .sublime-build
also seems to sort of work, printing the same warnings in the resulting document:
---
title: "Test"
output: html_document
---
Testing
![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png)
I guess I should mention that I'm on a Mac; I'm not really sure if this is relevant for folk on Windows.
Ooh, this looks promising. I've been toying around with it for the past hour, trying to get rid of the conversion failure warnings, but to no avail. It's a common problem for R graphics and knitr apparently (see the Encoding of multibyte characters section at the knitr manual). It looks like you can take care of the problem by manually specifying an encoding, but there's no UTF-8 encoding (apparently), so I don't know how to best generalize it. I'd love to know how RStudio does it.
Try adding "env": { "LANG": "en_US.UTF-8" }
to the default .sublime-build
and adding the following chunk before the chunk included in your test document:
```{r, echo = FALSE}
pdf.options(encoding = 'CP1250')
How does that work? It seems to have gotten rid of the conversion warnings for me. Cf. [this question](http://stackoverflow.com/questions/13251665/rhtml-warning-conversion-failure-on-var-in-mbcstosbcs-dot-substituted-f) on Stack Overflow.
Using encoding = 'ISOLatin2'
, instead of encoding = 'CP1250'
, also seems to work for me.
Fantastic - that works!
The only downside to this is that the user has to select an encoding that fits all the characters they're using in their document. If they use Chinese, Arabic, or Cyrillic characters, they'll need to change it accordingly.
However, I just tested it in RStudio and it has the same problem (and same solution; setting pdf.options()
in a chunk). So RStudio doesn't have a magic way to make this work—it's subject to the same encoding wonkiness in PDF images.
So, for future reference, adding a separate block with pdf.options()
will work. Here's a minimal working example:
---
title: "Test"
output: html_document
---
Testing
```{r, echo=FALSE}
pdf.options(encoding='ISOLatin2')
plot(cars, main="pučina")
Maybe this should be a separate issue, or maybe even this enters more into the jurisdiction of @LaTeXing, but it is directly related to the foregoing discussion, so I'll just add it here for the moment.
The solution above for Rmd documents does not seem to work for Rtex/Rnw/etc., where "č" and other non-English characters are rendered as ".." or as Unicode; admittedly, I have yet to manage to successfully incorporate the env
variable into the .sublime-build
.
Input:
\documentclass{article}
\title{Test}
\date{}
%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode
\begin{document}
\maketitle
Testing
%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode
%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode
\end{document}
Output:
Yes, this.
I've been working with another person (not on GitHub) with this exact issue (..
s in .Rnw
files). He asked a SO question and got an answer that said he should use Cairo, but it's a clunky solution and renders PDFs differently.
However, I don't know if this is a knitr issue. When he runs knitr from the Terminal, everything works great and all characters show up as expected. Building the .Rnw
file from ST is where encoding messes up. Perhaps adding "env": { "LANG": "en_US.UTF-8" },
to the LaTeXTools or LaTeXing build systems will make it work right?
I think you may be right about the issue being due to ST rather than to knitr, although I don't know much at all. In my encoding-related question on SO, @randy3k in a comment suggests I run:
import subprocess; print(subprocess.check_output("R -q -e 'Sys.getlocale()'", shell=True).decode('utf8'))
in ST's console and comparing the results with those gleaned from running, in the terminal:
R -q -e 'Sys.getlocale()'
It seems that, for me at least, there is some sort of disconnect (but, again, I don't know much on the subject): ST yields "C"
, while my terminal gives me "C/UTF-8/C/C/C/C"
.
Adding "env": { "LANG": "en_US.UTF-8" },
to my .sublime-build
variant for .Rmd subsequently fixed that issue, for which reason I have indeed tried repeatedly to add it to @LaTeXing's .sublime-build
. However, probably due to my own ineptitude, doing so has only resulted in a broken .sublime-build
, i.e., that does nothing but save the open file (no compile, no knit, etc.).
My terminal gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
, while ST gives just [1] "C"
.
But after creating ~/.Renviron
and adding LANG=en_US.UTF-8
, ST gives [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Try doing that and see if the ..
problem persists.
Adding LANG=en_US.UTF-8
to ~/.Renviron
seems to have mixed results for me1: "č" is rendered nicely without any warnings in the output .pdf, while "¡" and "é" are simply omitted, i.e., the Unicode code is no longer printed.
Input:
\documentclass{article}
\title{Test}
\date{}
%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode
\begin{document}
\maketitle
Testing. !` \'e
%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode
%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode
\end{document}
Output:
1 Running the bit of Python in ST gives me [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
as well.
Compare with the rendered .html from .Rmd:
---
title: "Test"
output: html_document
---
Testing. ¡ é
```{r, 'set-up', include = FALSE}
pdf.options(encoding='ISOLatin2')
plot(cars, main="pučina")
print('¡Qué tranza o qué!')
![captura de pantalla 2014-07-02 a la s 13 42 06](https://cloud.githubusercontent.com/assets/6853773/3462066/95a2ab4c-0229-11e4-8723-3afb6cdee769.png)
Oh, we're so close :)
The missing characters in the actual body of the PDF is probably due to LaTeX. Add this to the preamble: \usepackage[utf8]{inputenc}
That did it! Thanks very much.
\documentclass{article}
\usepackage[utf8]{inputenc}
\title{Test}
\date{}
%% begin.rcode, 'set-up', include = FALSE
% pdf.options(encoding = 'ISOLatin2')
%% end.rcode
\begin{document}
\maketitle
Testing. !`¡\'eé
%% begin.rcode, 'test_1', echo = FALSE
% plot(cars, main = "pučina")
%% end.rcode
%% begin.rcode, 'test_2'
% print('¡Qué tranza o qué!')
%% end.rcode
\end{document}
.Rmd
, .Rnw
/.Rtex
:
"env": { "LANG": "en_US.UTF-8" }
, to the .sublime-build
;.Rmd
or .Rnw
, a separate, preliminary chunk with pdf.options(encoding = '<encoding>')
allows for error and warning -free use of multibyte characters in graphics; run list.files(system.file('enc', package = 'grDevices'))
in R for available encodings
.Rnw
/.Rtex
exclusively:
\usepackage[utf8]{inputenc}
in the document header is necessary in order to successfully render multibyte characters in the text, be it knitted R output (which one doesn't necessarily see in the source document) or other textLANG=en_US.UTF-8
to ~/.Renviron
may be necessary in order for knitting done through Sublime Text to be encoding-error-freeThanks so much for your help!
very interesting discussion.
Another possible way to suppress the warnings is to use another graphic device, e.g.,
<<include = FALSE>>=
options(device = "cairo_pdf")
@
Yes, though I had someone else complain that the Cairo output wasn't as clear or nice looking as whatever R's default is.
@mmarascio
it is strange that "env": { "LANG": "en_US.UTF-8" }
in sublime-build
doesn't work for you but
adding LANG=en_US.UTF-8
to ~/.Renviron
works.
I believe that they should be the same, at least in sublime environment. May be I am wrong.
@randy3k: I'm not sure I understand; both alternatives do seem to work for me (see this relevant comment). Only, in addition, for non-ASCII characters in R plots, I need the preliminary chunk that sets pdf.options
and, for non-ASCII characters in knitr output in .Rnw - as should've been evident to me - I need \usepackage[utf8]{inputenc}
in the document's preamble.
I see. Thx for the clarification.
Knitr is apparently really picky about the encoding of the files it builds. If you try to build a file with Unicode characters in a plot using this plugin, R will choke on the characters and return them either as
..
or their Unicode code.Here's a minimal working example (
.Rmd
):Warning message: In native_encode(text) : some characters may not work under the current locale