davidgohel / rvg

https://davidgohel.github.io/rvg/
132 stars 15 forks source link

Superscripts and Unicode not working in ggplot titles #8

Closed davidgohel closed 6 years ago

davidgohel commented 8 years ago

Issue from @kainhofer

p <- ggplot(data=women, aes(x=height, y=weight)) + 
  scale_y_continuous(name=expression("ÖÄ"))
rvg::write_pptx("test.pptx", code = print(p))
wurzerj commented 7 years ago

Issue Description

First of all thanks for all your effort! I currently evaluate switching from ReporteRs to officer, but charting is a bit a problem. Since addPlot() does not exist in officer, I use ph_with_vg_at(). I also looked into using ph_with_img_at() and EMF-Files, but in that case trying to convert the plot into an Office shape removes everything but the axis, at least in my case. Unfortunately, as you already posted rvg, it is not capable of handling unicode. Since the issue is open for quite I while, I would like to ask if there are any plans to fix it.

Reproducible Example

require(officer)
require(ggplot2)
require(magrittr)
require(rvg)

my_plot <- ggplot(data=data.frame(x=1:100, y=rnorm(100)), aes(x=x, y=y)) + geom_line() + labs(title="Umlaut: ÄÖÜ äöü; Specials: &%$@€")

my_pres <- read_pptx() %>%
    add_slide(layout = "Title and Content", master = "Office Theme") %>%
    ph_with_text(type = "title", str = "A title") %>%
    ph_with_text(type = "ftr", str = "A footer") %>%
    ph_with_text(type = "dt", str = format(Sys.Date())) %>%
    ph_with_text(type = "sldNum", str = "slide 1") %>%
    ph_with_text(str = "Hello world", type = "body") %>%
    ph_with_vg_at(print(my_plot), left = 12.7/2.54, top = 4.5/2.54, width = 11.4/2.54, height = 12.4/2.54)

print(my_pres, target= "test.pptx")
Session Info devtools::session_info() Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- setting value version R version 3.3.1 (2016-06-21) system i386, mingw32 ui RStudio (1.1.383) language (EN) collate German_Austria.1252 tz Europe/Berlin date 2017-10-11 Packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ package * version date source assertthat 0.1 2013-12-06 CRAN (R 3.3.1) base64enc 0.1-3 2015-07-28 CRAN (R 3.3.2) colorspace 1.3-2 2016-12-14 CRAN (R 3.3.2) curl 2.3 2016-11-24 CRAN (R 3.3.2) DBI 0.7 2017-06-18 CRAN (R 3.3.3) devtools 1.12.0 2016-06-24 CRAN (R 3.3.1) digest 0.6.12 2017-01-27 CRAN (R 3.3.3) dplyr 0.5.0 2016-06-24 CRAN (R 3.3.1) gdtools * 0.1.6 2017-09-01 CRAN (R 3.3.3) ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.3.2) gtable 0.2.0 2016-02-26 CRAN (R 3.3.1) htmltools 0.3.6 2017-04-28 CRAN (R 3.3.3) httr 1.2.1 2016-07-03 CRAN (R 3.3.1) labeling 0.3 2014-08-23 CRAN (R 3.3.0) lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.1) magrittr * 1.5 2014-11-22 CRAN (R 3.3.1) memoise 1.0.0 2016-01-29 CRAN (R 3.3.1) munsell 0.4.3 2016-02-13 CRAN (R 3.3.1) officer * 0.1.8 2017-10-05 CRAN (R 3.3.3) plyr 1.8.4 2016-06-08 CRAN (R 3.3.1) purrr 0.2.2 2016-06-18 CRAN (R 3.3.2) R.methodsS3 1.7.1 2016-02-16 CRAN (R 3.3.0) R.oo 1.21.0 2016-11-01 CRAN (R 3.3.2) R.utils 2.5.0 2016-11-07 CRAN (R 3.3.2) R6 2.2.2 2017-06-17 CRAN (R 3.3.3) Rcpp 0.12.12 2017-07-15 CRAN (R 3.3.3) rvg * 0.1.6 2017-10-05 CRAN (R 3.3.3) scales 0.4.1 2016-11-09 CRAN (R 3.3.2) tibble 1.2 2016-08-26 CRAN (R 3.3.1) uuid 0.1-2 2015-07-28 CRAN (R 3.3.2) withr 1.0.2 2016-06-20 CRAN (R 3.3.1) xml2 1.1.0 2017-01-07 CRAN (R 3.3.2) yaml 2.1.14 2016-11-12 CRAN (R 3.3.2) zip 1.0.0 2017-04-25 CRAN (R 3.3.3)
davidgohel commented 7 years ago

shame on me :)

Yes, I will solved that one day but there is no date planned. It's not a priority because it works in UTF8 environment and it's lot of work for me (maybe only a line of code when I will have the solution... but it takes time to find the correct solution). It will be done when I will have a quiet time and a windows computer - unfortunately paid job come first ;)

wurzerj commented 7 years ago

Thanks, yes indeed, paid job comes first ;-) Thanks for the UTF8 hint.

I invested some time, unfortunately did not find an easy solution to fix this in your C++ code.

Anyway, I wrote a little workaround function for Windows systems and tested it on two Windows machines ... it seems to work. You could integrate this in ph_with_vg* within the tryCatch(), but of course modifying R options is not the cleanest solution.

Workaround function to replace ph_with_vg_at calls

funWorkaround <- function(x, code, left, top, width, height, ...) {
    # Re-Store old encoding on end
    sOldEnc <- getOption("encoding")
    on.exit(options(encoding=sOldEnc))

    # Modify encoding
    options(encoding="UTF-8")

    # Create plot
    return(ph_with_vg_at(x, code, left, top, width, height, ...))
}

Happy to test on Windows if you need help.

emiliemillet commented 6 years ago

Hi, On top of the previous comments, I faced another issue trying to use unicode in a ggplot label and ph_with_vg_at() to export in a powerpoint. Here is an example:

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length), size = 3) +
  theme_minimal() + 
  xlab(expression("Label of the x axis (m\u207B\u00B2)")) 

my_pres <- read_pptx()
my_pres <- add_slide(my_pres, layout = "Title and Content", master = "Office Theme")
my_pres <- ph_with_vg_at(my_pres, ggobj = gg_plot, left = 0.9, top = 0.9,
                         width = 8, height = 6.4)
print(my_pres, target = "test.pptx")

The error message:

Error in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y,  : 
  Metric information not available for this family/device

However, when removing the second unicode character, it works:

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length), size = 3) +
  theme_minimal() + 
  xlab(expression("Label of the x axis (m\u207B)")) 

my_pres <- read_pptx()
my_pres <- add_slide(my_pres, layout = "Title and Content", master = "Office Theme")
my_pres <- ph_with_vg_at(my_pres, ggobj = gg_plot, left = 0.9, top = 0.9,
                         width = 8, height = 6.4)
print(my_pres, target = "test.pptx")

It seems that some unicodes are working and some others not. Is there a way to fix that ?

Thanks in advance :)

My session:

R version 3.4.3 (2017-11-30) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Linux Mint 18.2  
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0  locale:  
LC_CTYPE=en_GB.UTF-8      
 LC_NUMERIC=C               
LC_TIME=en_GB.UTF-8        
LC_COLLATE=en_GB.UTF-8      
LC_MONETARY=nl_NL.UTF-8   
 LC_MESSAGES=en_GB.UTF-8    
LC_PAPER=nl_NL.UTF-8       
LC_NAME=C                   
LC_ADDRESS=C              
 LC_TELEPHONE=C             
LC_MEASUREMENT=nl_NL.UTF-8 
LC_IDENTIFICATION=C         
attached base packages: stats     graphics  grDevices utils     datasets  methods   base       other attached packages: 
gdtools_0.1.6    
rvg_0.1.7        
officer_0.2.1    
ggpubr_0.1.6.999 
magrittr_1.5     
ggplot2_2.2.1
emiliemillet commented 6 years ago

Quick but partial solution: remove expression(). This will work for this example but I can't add any super/subscript text anymore.

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length), size = 3) +
  theme_minimal() + 
  xlab("Label of the x axis (m\u207B\u00B2)")

my_pres <- read_pptx()
my_pres <- add_slide(my_pres, layout = "Title and Content", master = "Office Theme")
my_pres <- ph_with_vg_at(my_pres, ggobj = gg_plot, left = 0.9, top = 0.9,
                         width = 8, height = 6.4)
print(my_pres, target = "test.pptx")

It worked on my linux laptop but not on a windows computer. I tried with options(encoding="UTF-8") on the windows computer but then the character \u207B still crashes. Moreover, if I try to play around with the fonts on windows it still crashes:

my_pres <- ph_with_vg_at(my_pres, ggobj = gg_plot, left = 0.9, top = 0.9,
                   width = 8, height = 6.4, 
                   fonts = list(serif = "Arial Unicode MS"))

This doesn't crash but creates a wrong symbol:

my_pres <- ph_with_vg_at(my_pres, ggobj = gg_plot, left = 0.9, top = 0.9,
                        width = 8, height = 6.4, 
                        fonts = list(serif = "Arial Unicode MS",
                                         symbol="Arial Unicode MS"))

To be continued...

davidgohel commented 6 years ago

There has been progress :)

@wurzerj The new version should fix your issue.

require(officer)
require(ggplot2)
require(magrittr)
require(rvg)

my_plot <- ggplot(data=data.frame(x=1:100, y=rnorm(100)), aes(x=x, y=y)) + geom_line() + labs(title="Umlaut: ÄÖÜ äöü; Specials: &%$@€")
my_pres <- read_pptx() %>% 
  add_slide(layout = "Title and Content", master = "Office Theme") %>% 
  ph_with_vg_at(ggobj = my_plot, left = 0.9, top = 0.9,
                width = 8, height = 6.4, font = list(sans = "Calibri")) %>% 
  print(target = "test.pptx") 

It produces that result on a Windows machine: test.pptx

My sessionInfo():

> devtools::session_info()
Session info --------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   i386, mingw32               
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  French_France.1252          
 tz       Europe/Berlin               
 date     2018-02-06                  

Packages ------------------------------------------------------------------------------------------------------------------------------------------------
 package     * version    date       source        
 base        * 3.4.3      2017-12-06 local         
 base64enc     0.1-3      2015-07-28 CRAN (R 3.4.1)
 colorspace    1.3-2      2016-12-14 CRAN (R 3.4.3)
 compiler      3.4.3      2017-12-06 local         
 datasets    * 3.4.3      2017-12-06 local         
 devtools      1.13.4     2017-11-09 CRAN (R 3.4.3)
 digest        0.6.13     2017-12-14 CRAN (R 3.4.3)
 gdtools     * 0.1.6      2018-01-25 local         
 ggplot2     * 2.2.1      2016-12-30 CRAN (R 3.4.3)
 graphics    * 3.4.3      2017-12-06 local         
 grDevices   * 3.4.3      2017-12-06 local         
 grid          3.4.3      2017-12-06 local         
 gtable        0.2.0      2016-02-26 CRAN (R 3.4.3)
 htmltools     0.3.6      2017-04-28 CRAN (R 3.4.3)
 labeling      0.3        2014-08-23 CRAN (R 3.4.1)
 lazyeval      0.2.1      2017-10-29 CRAN (R 3.4.3)
 magrittr    * 1.5        2014-11-22 CRAN (R 3.4.3)
 memoise       1.1.0      2017-04-21 CRAN (R 3.4.3)
 methods     * 3.4.3      2017-12-06 local         
 munsell       0.4.3      2016-02-13 CRAN (R 3.4.3)
 officer     * 0.2.0      2017-12-02 CRAN (R 3.4.3)
 pillar        1.0.1      2017-11-27 CRAN (R 3.4.3)
 plyr          1.8.4      2016-06-08 CRAN (R 3.4.3)
 R.methodsS3   1.7.1      2016-02-16 CRAN (R 3.4.1)
 R.oo          1.21.0     2016-11-01 CRAN (R 3.4.1)
 R.utils       2.6.0      2017-11-05 CRAN (R 3.4.3)
 R6            2.2.2      2017-06-17 CRAN (R 3.4.3)
 Rcpp          0.12.14    2017-11-23 CRAN (R 3.4.3)
 rlang         0.1.6      2017-12-21 CRAN (R 3.4.3)
 rstudioapi    0.7        2017-09-07 CRAN (R 3.4.3)
 rvg         * 0.1.8.0001 2018-02-05 local         
 scales        0.5.0      2017-08-24 CRAN (R 3.4.3)
 stats       * 3.4.3      2017-12-06 local         
 tibble        1.4.1      2017-12-25 CRAN (R 3.4.3)
 tools         3.4.3      2017-12-06 local         
 utils       * 3.4.3      2017-12-06 local         
 uuid          0.1-2      2015-07-28 CRAN (R 3.4.1)
 withr         2.1.1      2017-12-19 CRAN (R 3.4.3)
 xml2          1.1.1      2017-01-24 CRAN (R 3.4.3)
 yaml          2.1.16     2017-12-12 CRAN (R 3.4.3)
 zip           1.0.0      2017-04-25 CRAN (R 3.4.3)
davidgohel commented 6 years ago

@emiliemillet The new version is better but still it am not sure to understand what is needed to make your case possible.

Below a comparison between devices pptx, png and pdf on a windows machine and the produced files - it seems the new version is descent compared to png or pdf (the wrong symbol has been fixed also):

gg_plot_noexpr.pptx gg_plot_expr.pdf gg_plot_expr gg_plot_noexpr.pdf gg_plot_noexpr

require(officer)
require(ggplot2)
require(magrittr)
require(rvg)

dir.create("gh")

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length), size = 3) +
  theme_minimal() + 
  xlab(expression("Label of the x axis (m\u207B\u00B2)")) 

my_pres <- read_pptx() %>% 
  add_slide(layout = "Title and Content", master = "Office Theme") %>% 
  ph_with_vg_at(ggobj = gg_plot, left = 0.9, top = 0.9,
                width = 8, height = 6.4) %>% 
  print(target = "gh/gg_plot_expr.pptx")

png(filename = "gh/gg_plot_expr.png", width = 8, height = 6.4, units = "in", res = 200 )
print(gg_plot)
dev.off()

pdf(file = "gh/gg_plot_expr.pdf")
print(gg_plot)
dev.off()

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length), size = 3) +
  theme_minimal() + 
  xlab("Label of the x axis (m\u207B\u00B2)")

my_pres <- read_pptx() %>% 
  add_slide(layout = "Title and Content", master = "Office Theme") %>% 
  ph_with_vg_at(ggobj = gg_plot, left = 0.9, top = 0.9,
                width = 8, height = 6.4) %>% 
  print(target = "gh/gg_plot_noexpr.pptx")

png(filename = "gh/gg_plot_noexpr.png", width = 8, height = 6.4, units = "in", res = 200 )
print(gg_plot)
dev.off()

pdf(file = "gh/gg_plot_noexpr.pdf")
print(gg_plot)
dev.off()

What is happening is that the glyph (corresponding to the unicode character) does not exist in the font table specified by the graphical parameter fontname. I'd like to provide a solution but I am not sure it is possible. pdf which is a reference r device is complaining for the same reasons:

1: In grid.Call(C_stringMetric, as.graphicsAnnot(x$label)) :
  taille de police inconnue pour le caractère Unicode U+207b

David

davidgohel commented 6 years ago

I am closing that issue, the new version will be submitted soon on CRAN. Give a shout if if I missed anything.

wurzerj commented 6 years ago

@davidgohel Sorry, but on my windows machine it still does not work.

How did you execute the code? I tried to execute your example in various ways:

I'll try to dig a bit deeper into it tomorrow/the day after tomorrow. Unfortunately no time for it today.

require(officer)
require(ggplot2)
require(magrittr)
require(rvg)

my_plot <- ggplot(data=data.frame(x=1:100, y=rnorm(100)), aes(x=x, y=y)) + geom_line() + labs(title="Umlaut: ÄÖÜ äöü; Specials: &%$@€")
my_pres <- read_pptx() %>% 
  add_slide(layout = "Title and Content", master = "Office Theme") %>% 
  ph_with_vg_at(ggobj = my_plot, left = 0.9, top = 0.9,
                width = 8, height = 6.4, font = list(sans = "Calibri")) %>% 
  print(target = "test.pptx") 

It produces this file, still showing the weird characters: test.pptx

Here is my session info, seems to be consistent with yours - collate of course not:

> devtools::session_info()
Session info --------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   i386, mingw32               
 ui       Rgui                        
 language (EN)                        
 collate  German_Austria.1252         
 tz       Europe/Berlin               
 date     2018-02-07                  

Packages ------------------------------------------------------------------------------------------------------------------------------------
 package     * version    date       source        
 assertthat    0.1        2013-12-06 CRAN (R 3.3.1)
 base64enc     0.1-3      2015-07-28 CRAN (R 3.3.2)
 colorspace    1.3-2      2016-12-14 CRAN (R 3.3.2)
 curl          2.3        2016-11-24 CRAN (R 3.3.2)
 devtools    * 1.12.0     2016-06-24 CRAN (R 3.3.1)
 digest        0.6.12     2017-01-27 CRAN (R 3.3.3)
 gdtools     * 0.1.6      2017-09-01 CRAN (R 3.3.3)
 ggplot2     * 2.2.1      2016-12-30 CRAN (R 3.3.2)
 gtable        0.2.0      2016-02-26 CRAN (R 3.3.1)
 htmltools     0.3.6      2017-04-28 CRAN (R 3.3.3)
 httr          1.2.1      2016-07-03 CRAN (R 3.3.1)
 labeling      0.3        2014-08-23 CRAN (R 3.3.0)
 lazyeval      0.2.0      2016-06-12 CRAN (R 3.3.1)
 magrittr    * 1.5        2014-11-22 CRAN (R 3.3.1)
 memoise       1.0.0      2016-01-29 CRAN (R 3.3.1)
 munsell       0.4.3      2016-02-13 CRAN (R 3.3.1)
 officer     * 0.2.0      2017-12-02 CRAN (R 3.3.3)
 plyr          1.8.4      2016-06-08 CRAN (R 3.3.1)
 R.methodsS3   1.7.1      2016-02-16 CRAN (R 3.3.0)
 R.oo          1.21.0     2016-11-01 CRAN (R 3.3.2)
 R.utils       2.5.0      2016-11-07 CRAN (R 3.3.2)
 R6            2.2.2      2017-06-17 CRAN (R 3.3.3)
 Rcpp          0.12.12    2017-07-15 CRAN (R 3.3.3)
 rvg         * 0.1.8.0001 2018-02-07 local         
 scales        0.4.1      2016-11-09 CRAN (R 3.3.2)
 tibble        1.2        2016-08-26 CRAN (R 3.3.1)
 uuid          0.1-2      2015-07-28 CRAN (R 3.3.2)
 withr         1.0.2      2016-06-20 CRAN (R 3.3.1)
 xml2          1.1.0      2017-01-07 CRAN (R 3.3.2)
 zip           1.0.0      2017-04-25 CRAN (R 3.3.3)
davidgohel commented 6 years ago

Right, I can see I made a mistake during the commit...

davidgohel commented 6 years ago

Can you try again with the version I just commited?

wurzerj commented 6 years ago

0.1.8.0002 looks a lot better. Thanks so much for your effort!

It works for me now in most cases:

The only thing that did not work in my test was sourcing a file (containing your sample code) with UTF-8 encoding. In this specific case I still get weird characters.

davidgohel commented 6 years ago

Thanks for your quick feedback, it helps.

The only thing that did not work in my test was sourcing a file (containing your sample code) with UTF-8 encoding. In this specific case I still get weird characters.

I will have a look at that and see what's going on