choonghyunryu / dlookr

Tools for Data Diagnosis, Exploration, Transformation
https://choonghyunryu.github.io/dlookr/
207 stars 35 forks source link

diagnose_report() function errors #11

Closed dpolychr closed 4 years ago

dpolychr commented 4 years ago

Hi @choonghyunryu,

First of, many thanks for this package, really useful for EDA! I am using the function diagnose_report and it gives me errors when trying to generate either html or pdf. For instance:

df %>% diagnose_report(output_format = "html")

gives:

Quitting from lines 109-122 (Diagnosis_Report.Rmd) 
 Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xA0 0x4D 0x20 0x3C [9] 

and

df %>% diagnose_report(output_file = "Diagn.pdf")

gives:

output file: /var/folders/54/f4d8z7ps1lx_w4xcj_8p16y0mjrjn3/T//RtmpFCdNMH/Diagn.tex

tlmgr search --file --global '/setspace.sty'
Proxy must be specified as absolute URI; '194.34.82.250:10263' is not at /Users/kkrg658/Library/TinyTeX/tlpkg/TeXLive/TLDownload.pm line 44.
! LaTeX Error: File `setspace.sty' not found.

! Emergency stop.
<read *> 

Error: Failed to compile Diagn.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips. See Diagn.log for more info.
In addition: Warning messages:
1: In dir.create(paste(path, "figure", sep = "/")) :
  '/var/folders/54/f4d8z7ps1lx_w4xcj_8p16y0mjrjn3/T//RtmpFCdNMH/figure' already exists
2: In system2("tlmgr", args, ...) :
  running command ''tlmgr' search --file --global '/setspace.sty'' had status 255

Any idea why this might be happening?

My session info:

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.1 (2019-07-05)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_GB.UTF-8                 
 ctype    en_GB.UTF-8                 
 tz       Europe/London               
 date     2020-02-14                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version  date       lib source        
 abind          1.4-5    2016-07-21 [1] CRAN (R 3.6.0)
 acepack        1.4.1    2016-10-29 [1] CRAN (R 3.6.0)
 assertthat     0.2.1    2019-03-21 [1] CRAN (R 3.6.0)
 backports      1.1.5    2019-10-02 [1] CRAN (R 3.6.0)
 base64enc      0.1-3    2015-07-28 [1] CRAN (R 3.6.0)
 bit            1.1-14   2018-05-29 [1] CRAN (R 3.6.0)
 bit64          0.9-7    2017-05-08 [1] CRAN (R 3.6.0)
 bitops         1.0-6    2013-08-17 [1] CRAN (R 3.6.0)
 blob           1.2.0    2019-07-09 [1] CRAN (R 3.6.0)
 boot           1.3-22   2019-04-02 [1] CRAN (R 3.6.1)
 broom          0.5.2    2019-04-07 [1] CRAN (R 3.6.0)
 car            3.0-6    2019-12-23 [1] CRAN (R 3.6.0)
 carData        3.0-3    2019-11-16 [1] CRAN (R 3.6.0)
 caTools        1.17.1.3 2019-11-30 [1] CRAN (R 3.6.0)
 cellranger     1.1.0    2016-07-27 [1] CRAN (R 3.6.0)
 checkmate      1.9.4    2019-07-04 [1] CRAN (R 3.6.0)
 chron          2.3-55   2020-02-02 [1] CRAN (R 3.6.0)
 class          7.3-15   2019-01-01 [1] CRAN (R 3.6.1)
 classInt       0.4-2    2019-10-17 [1] CRAN (R 3.6.0)
 cli            1.1.0    2019-03-19 [1] CRAN (R 3.6.0)
 cluster        2.1.0    2019-06-19 [1] CRAN (R 3.6.1)
 colorspace     1.4-1    2019-03-18 [1] CRAN (R 3.6.0)
 corrplot       0.84     2017-10-16 [1] CRAN (R 3.6.0)
 crayon         1.3.4    2017-09-16 [1] CRAN (R 3.6.0)
 curl           4.2      2019-09-24 [1] CRAN (R 3.6.0)
 data.table     1.12.2   2019-04-07 [1] CRAN (R 3.6.0)
 DBI            1.0.0    2018-05-02 [1] CRAN (R 3.6.0)
 digest         0.6.21   2019-09-20 [1] CRAN (R 3.6.0)
 dlookr       * 0.3.13   2020-01-09 [1] CRAN (R 3.6.0)
 DMwR           0.4.1    2013-08-08 [1] CRAN (R 3.6.0)
 dplyr        * 0.8.3    2019-07-04 [1] CRAN (R 3.6.0)
 e1071          1.7-2    2019-06-05 [1] CRAN (R 3.6.0)
 evaluate       0.14     2019-05-28 [1] CRAN (R 3.6.0)
 fansi          0.4.0    2018-10-05 [1] CRAN (R 3.6.0)
 forcats      * 0.4.0    2019-02-17 [1] CRAN (R 3.6.0)
 foreign        0.8-71   2018-07-20 [1] CRAN (R 3.6.1)
 Formula        1.2-3    2018-05-03 [1] CRAN (R 3.6.0)
 gdata          2.18.0   2017-06-06 [1] CRAN (R 3.6.0)
 generics       0.0.2    2018-11-29 [1] CRAN (R 3.6.0)
 ggplot2      * 3.2.1    2019-08-10 [1] CRAN (R 3.6.0)
 glue           1.3.1    2019-03-12 [1] CRAN (R 3.6.0)
 gplots         3.0.1.2  2020-01-11 [1] CRAN (R 3.6.0)
 gridExtra      2.3      2017-09-09 [1] CRAN (R 3.6.0)
 gsubfn         0.7      2018-03-16 [1] CRAN (R 3.6.0)
 gtable         0.3.0    2019-03-25 [1] CRAN (R 3.6.0)
 gtools         3.8.1    2018-06-26 [1] CRAN (R 3.6.0)
 haven          2.1.1    2019-07-04 [1] CRAN (R 3.6.0)
 highr          0.8      2019-03-20 [1] CRAN (R 3.6.0)
 Hmisc          4.3-1    2020-02-07 [1] CRAN (R 3.6.0)
 hms            0.5.1    2019-08-23 [1] CRAN (R 3.6.0)
 htmlTable      1.13.3   2019-12-04 [1] CRAN (R 3.6.0)
 htmltools      0.4.0    2019-10-04 [1] CRAN (R 3.6.0)
 htmlwidgets    1.5.1    2019-10-08 [1] CRAN (R 3.6.0)
 httr           1.4.1    2019-08-05 [1] CRAN (R 3.6.0)
 inum           1.0-1    2019-04-25 [1] CRAN (R 3.6.0)
 janitor      * 1.2.0    2019-04-21 [1] CRAN (R 3.6.0)
 jomo           2.6-10   2019-10-22 [1] CRAN (R 3.6.0)
 jpeg           0.1-8.1  2019-10-24 [1] CRAN (R 3.6.0)
 jsonlite       1.6      2018-12-07 [1] CRAN (R 3.6.0)
 kableExtra   * 1.1.0    2019-03-16 [1] CRAN (R 3.6.0)
 KernSmooth     2.23-15  2015-06-29 [1] CRAN (R 3.6.1)
 knitr        * 1.25     2019-09-18 [1] CRAN (R 3.6.0)
 lattice      * 0.20-38  2018-11-04 [1] CRAN (R 3.6.1)
 latticeExtra   0.6-29   2019-12-19 [1] CRAN (R 3.6.0)
 lazyeval       0.2.2    2019-03-15 [1] CRAN (R 3.6.0)
 libcoin        1.0-5    2019-08-27 [1] CRAN (R 3.6.0)
 lifecycle      0.1.0    2019-08-01 [1] CRAN (R 3.6.0)
 lme4           1.1-21   2019-03-05 [1] CRAN (R 3.6.0)
 lubridate      1.7.4    2018-04-11 [1] CRAN (R 3.6.0)
 magrittr     * 1.5      2014-11-22 [1] CRAN (R 3.6.0)
 MASS           7.3-51.4 2019-03-31 [1] CRAN (R 3.6.1)
 Matrix         1.2-17   2019-03-22 [1] CRAN (R 3.6.1)
 memoise        1.1.0    2017-04-21 [1] CRAN (R 3.6.0)
 mice         * 3.7.0    2019-12-13 [1] CRAN (R 3.6.0)
 minqa          1.2.4    2014-10-09 [1] CRAN (R 3.6.0)
 mitml          0.3-7    2019-01-07 [1] CRAN (R 3.6.0)
 modelr         0.1.5    2019-08-08 [1] CRAN (R 3.6.0)
 moments        0.14     2015-01-05 [1] CRAN (R 3.6.0)
 munsell        0.5.0    2018-06-12 [1] CRAN (R 3.6.0)
 mvtnorm        1.0-11   2019-06-19 [1] CRAN (R 3.6.0)
 nlme           3.1-140  2019-05-12 [1] CRAN (R 3.6.1)
 nloptr         1.2.1    2018-10-03 [1] CRAN (R 3.6.0)
 nnet           7.3-12   2016-02-02 [1] CRAN (R 3.6.1)
 nortest        1.0-4    2015-07-30 [1] CRAN (R 3.6.0)
 openxlsx       4.1.0.1  2019-05-28 [1] CRAN (R 3.6.0)
 pan            1.6      2018-06-29 [1] CRAN (R 3.6.0)
 partykit       1.2-6    2020-01-30 [1] CRAN (R 3.6.0)
 pillar         1.4.2    2019-06-29 [1] CRAN (R 3.6.0)
 pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 3.6.0)
 png            0.1-7    2013-12-03 [1] CRAN (R 3.6.0)
 prettydoc      0.3.1    2019-11-23 [1] CRAN (R 3.6.0)
 proto          1.0.0    2016-10-29 [1] CRAN (R 3.6.0)
 purrr        * 0.3.2    2019-03-15 [1] CRAN (R 3.6.0)
 quantmod       0.4-15   2019-06-17 [1] CRAN (R 3.6.0)
 R6             2.4.0    2019-02-14 [1] CRAN (R 3.6.0)
 RcmdrMisc      2.7-0    2020-01-14 [1] CRAN (R 3.6.0)
 RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 3.6.0)
 Rcpp           1.0.2    2019-07-25 [1] CRAN (R 3.6.0)
 readr        * 1.3.1    2018-12-21 [1] CRAN (R 3.6.0)
 readxl       * 1.3.1    2019-03-13 [1] CRAN (R 3.6.0)
 rio            0.5.16   2018-11-26 [1] CRAN (R 3.6.0)
 rlang          0.4.2    2019-11-23 [1] CRAN (R 3.6.0)
 rmarkdown      2.1      2020-01-20 [1] CRAN (R 3.6.0)
 ROCR           1.0-7    2015-03-26 [1] CRAN (R 3.6.0)
 rpart          4.1-15   2019-04-12 [1] CRAN (R 3.6.1)
 RSQLite        2.1.2    2019-07-24 [1] CRAN (R 3.6.0)
 rstudioapi     0.10     2019-03-19 [1] CRAN (R 3.6.0)
 rvest          0.3.4    2019-05-15 [1] CRAN (R 3.6.0)
 sandwich       2.5-1    2019-04-06 [1] CRAN (R 3.6.0)
 scales         1.0.0    2018-08-09 [1] CRAN (R 3.6.0)
 sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 3.6.0)
 smbinning      0.9      2019-04-01 [1] CRAN (R 3.6.0)
 sqldf          0.4-11   2017-06-28 [1] CRAN (R 3.6.0)
 stringi        1.4.3    2019-03-12 [1] CRAN (R 3.6.0)
 stringr      * 1.4.0    2019-02-10 [1] CRAN (R 3.6.0)
 survival       3.1-8    2019-12-03 [1] CRAN (R 3.6.0)
 tibble       * 2.1.3    2019-06-06 [1] CRAN (R 3.6.0)
 tidyr        * 1.0.0    2019-09-11 [1] CRAN (R 3.6.0)
 tidyselect     0.2.5    2018-10-11 [1] CRAN (R 3.6.0)
 tidyverse    * 1.2.1    2017-11-14 [1] CRAN (R 3.6.0)
 tinytex        0.16     2019-09-17 [1] CRAN (R 3.6.0)
 TTR            0.23-6   2019-12-15 [1] CRAN (R 3.6.0)
 utf8           1.1.4    2018-05-24 [1] CRAN (R 3.6.0)
 vctrs          0.2.0    2019-07-05 [1] CRAN (R 3.6.0)
 viridisLite    0.3.0    2018-02-01 [1] CRAN (R 3.6.0)
 webshot        0.5.2    2019-11-22 [1] CRAN (R 3.6.0)
 withr          2.1.2    2018-03-15 [1] CRAN (R 3.6.0)
 xfun           0.10     2019-10-01 [1] CRAN (R 3.6.0)
 xml2           1.2.2    2019-08-09 [1] CRAN (R 3.6.0)
 xtable       * 1.8-4    2019-04-21 [1] CRAN (R 3.6.0)
 xts            0.12-0   2020-01-19 [1] CRAN (R 3.6.0)
 yaml           2.2.0    2018-07-25 [1] CRAN (R 3.6.0)
 zeallot        0.1.0    2018-01-28 [1] CRAN (R 3.6.0)
 zip            2.0.4    2019-09-01 [1] CRAN (R 3.6.0)
 zoo            1.8-7    2020-01-10 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Many thanks, Dimitris

choonghyunryu commented 4 years ago

Hi @dpolychr

Thank you for sharing your error case.

The error in html seems to be related to the encoding.

In the case of pdf, latex is used to generate the document. The error message indicates that latex's setspace package is not installed. If the report was generated normally with other data, it may be due to the data.

Can you share data to determine and resolve the cause more accurately?

Thanks & Vest Regards, Choonghyun Ryu

dpolychr commented 4 years ago

Hi @choonghyunryu,

Many thanks for you response! Unfortunately I am not able to share this dataset as it contains sensitive data. However I can confirm that I get the same error with the starwars datasets from tidyverse:

library(tidyverse)
starwars %>% diagnose_report(output_file = "Diagn.pdf")

and output is:

output file: /var/folders/54/f4d8z7ps1lx_w4xcj_8p16y0mjrjn3/T//RtmpEt7Lkt/Diagn.tex

tlmgr search --file --global '/setspace.sty'
Proxy must be specified as absolute URI; '194.34.82.250:10263' is not at /Users/kkrg658/Library/TinyTeX/tlpkg/TeXLive/TLDownload.pm line 44.
! LaTeX Error: File `setspace.sty' not found.

! Emergency stop.
<read *> 

Error: Failed to compile Diagn.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips. See Diagn.log for more info.
In addition: Warning message:
In system2("tlmgr", args, ...) :
  running command ''tlmgr' search --file --global '/setspace.sty'' had status 255

Many thanks for your help, much appreciated

choonghyunryu commented 4 years ago

Hi @dpolychr

Here are some solutions. Please check and contact us.

1. When dlookr generates a pdf report, it uses Latex. Perhaps no setspace package (not R package) is installed on your Latex system. Please refer to https://yihui.org/tinytex/faq/, https://yihui.org/tinytex/r/#debugging to install the setspace package.

Create and compile the following Latex example.

\documentclass{article} \usepackage{setspace}

\begin{document} \doublespacing body \end{document}

2. Quitting from lines 109-122 (Diagnosis_Report.Rmd)  Error in doc_parse_raw (x, encoding = encoding, base_url = base_url, as_html = as_html,:   Input is not proper UTF-8, indicate encoding! Bytes: 0xA0 0x4D 0x20 0x3C [9]

This is an encoding issue. Your data seems to contain special characters that are not English characters. How many strings can you send me?

3. The starwars data in the tidyverse package contains a list object in a variable. However, dlookr does not support the list variable.

I was able to get the normal result by removing the list object as follows:

starwars%>%   select (-films, -vehicles, -starships)%>%   diagnose_report (output_file = "Diagn.pdf")

starwars%>%   select (-films, -vehicles, -starships)%>%   diagnose_report (output_file = "Diagn.pdf")

choonghyunryu commented 4 years ago

There is no long-term answer, so it close.