Support unicode characters.

leonbaum commented 12 years ago

Even when using luatex or xetex, it appears that pgfsweave will remove some unicode characters. For example, <<>>= "α" @ produces the output [1] ""

There also seems to be a problem sometimes with detecting the encoding. For example, <<>>= α <- 1 @ produces an error:

Error: chunk 1 Error in parse(text = chunk) : 1:0: unexpected input 1: Î ^ In addition: Warning messages: 1: âa.Rnw’ has unknown encoding: assuming Latin-1 2: In strsplit(msg, "\n") : input string 1 is invalid in this locale Execution halted

My locale is en_US.UTF8, and I have no problems using these unicode characters directly in R and in latex.

sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit)

locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] tools_2.13.1

α <- 1 α [1] 1

yihui commented 12 years ago

Do you have \usepackage[utf8]{inputenc} in your Rnw document? Since currently there is no way to pass the encoding argument to Sweave from pgfSweave (see #36), you have to use this LaTeX trick to tell Sweave that your document is UTF-8 encoded.

leonbaum commented 12 years ago

Thanks for your response, but adding "\usepackage[utf8]{inputenc} did not help, and it also caused a warning:

* you should not be loading the inputenc package * XeTeX expects the source to be in UTF8 encoding *\ some features of other encodings may conflict, resulting in poor output.

yihui commented 12 years ago

Oh, you are using XeLaTeX... Then I guess it will be difficult to solve.

This gives me yet another motivation to rewrite a literate programming engine to replace the old good Sweave... although in this case it is not the fault of Sweave, but I see the difficulty for add-on packages to be compatible with Sweave again.

hans-ekbrand commented 12 years ago

I get a message that implies that the problem is tex, not pgfSweave:

In getMetricsFromLatex(TeXMetrics) :
  XeLaTeX was unable to calculate metrics for some characters:
     Missing character: There is no α in font ec-lmbx10!

Also, the α character is present the tex-file that pgfSweave produces. I think this issue should be closed - not a bug.

yihui commented 12 years ago

As I said, this is just a tip of the iceberg which reveals fundamental design flaws of Sweave:

it ties itself to LaTeX too closely (e.g. \usepackage[utf8]{inputenc} is the trick to tell Sweave the encoding of the document, but it does not work with XeLaTeX);
it is hard to extend without copying a large amount of its source code, therefore it poses a headache to package developers who base their packages on Sweave; pgfSweave uses a more recent version of Sweave but AFAIK the encoding support is lagging behind Sweave, and cacheSweave uses an even older version of Sweave; the design of Sweave makes it difficult to follow all its changes;

For this particular issue, you got α probably because your native locale is UTF8 so you do not need to tell (pgf)Sweave your encoding.

coquito77 commented 9 years ago

I get the following error "Line 25 Error in parse(text = x, srcfile = src) : :68:1: unexpected input"

when run the following code to dowload a PDF

url <- "http://ccjcc.lacounty.gov/LinkClick.aspx?fileticket=HcUb-Mi2foo%3d&portalid=11" dest <- tempfile(fileext = ".pdf") download.file(url, dest, mode = "wb")

Any ideas on how to make it run? Thanks

cameronbracken / pgfSweave

Support unicode characters. #38