choonghyunryu / dlookr

Tools for Data Diagnosis, Exploration, Transformation
https://choonghyunryu.github.io/dlookr/
208 stars 35 forks source link

Errors with eda_paged_report [English] [0.6.0] #86

Closed SaintRod closed 9 months ago

SaintRod commented 1 year ago

First, thank you for this great work. I tried two other EDA libraries before landing on dlookr. The others had pandoc issues while dlooker was able to create an hmtl report seamlessly.

Background Info:

Initial call to eda_paged_report()

First error:

Quitting from lines 212-215 (eda_paged_temp.Rmd) 
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  NA/NaN/Inf in 'x'

I am able to bypass this error by using the below workaround. Is this is expected or unexpected behavior?

dlookr::eda_paged_report(
  .data = na.omit(df),
  target = "y",
  output_format = "html",
  output_file = "EDA.html",
  output_dir = path,
  browse = FALSE
)

Second error:

target variabe is not in  ("ordered", "factor", "character")
Quitting from lines 250-251 (eda_paged_temp.Rmd) 
Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "c('relate', 'lm')"
In addition: Warning message:
'dlookr::plot_correlate' is deprecated.
Use 'plot.correlate' instead.
See help("Deprecated")

I'm able to bypass the second error by removing the target variable, but oddly the report EDA.html was blank after page 2. Please let me know how I can assist and if there is anything else I can provide. Thank you

Session Info

R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_2.0.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        Rttf2pt1_1.3.8    Formula_1.2-4     janitor_2.1.0     knitr_1.40        xml2_1.3.3       
 [7] sysfonts_0.8.8    splines_3.6.3     usethis_2.1.6     rvest_1.0.3       tidyselect_1.2.0  bit_4.0.4        
[13] xtable_1.8-4      viridisLite_0.4.1 colorspace_2.0-3  lattice_0.20-38   R6_2.5.1          rlang_1.0.6      
[19] fansi_1.0.3       parallel_3.6.3    vroom_1.6.0       xfun_0.34         withr_2.5.0       extrafontdb_1.0  
[25] ellipsis_0.3.2    systemfonts_1.0.4 htmltools_0.5.3   assertthat_0.2.1  bit64_4.0.5       rprojroot_2.0.3  
[31] digest_0.6.30     tibble_3.1.8      lifecycle_1.0.3   Matrix_1.5-1      shiny_1.7.3       rmarkdown_2.17   
[37] compiler_3.6.3    pillar_1.8.1      forcats_0.5.2     scales_1.2.1      gdtools_0.2.3     generics_0.1.3   
[43] extrafont_0.17    showtext_0.9-5    partykit_1.2-13   lubridate_1.8.0   future_1.28.0     reactable_0.3.0  
[49] svglite_2.0.0     listenv_0.8.0     httpuv_1.6.1      pkgconfig_2.0.3   parallelly_1.32.1 rstudioapi_0.14  
[55] munsell_0.5.0     fastmap_1.1.0     httr_1.4.4        showtextdb_3.0    dplyr_1.0.10      stringr_1.4.1    
[61] globals_0.16.1    tictoc_1.1        hrbrthemes_0.8.0  tools_3.6.3       grid_3.6.3        webshot_0.5.2    
[67] data.table_1.14.4 gtable_0.3.1      utf8_1.2.2        DBI_1.1.3         cli_3.4.1         yaml_2.3.6       
[73] survival_3.1-8    inum_1.0-4        crayon_1.5.2      libcoin_1.0-8     kableExtra_1.3.4  gridExtra_2.3    
[79] tidyr_1.2.1       purrr_0.3.5       ggplot2_3.3.6     dlookr_0.6.0      later_1.2.0       tzdb_0.3.0       
[85] codetools_0.2-16  promises_1.2.0.1  htmlwidgets_1.5.4 fs_1.5.2          vctrs_0.5.0       rpart_4.1-15     
[91] snakecase_0.11.0  glue_1.6.2        evaluate_0.17     mime_0.12         stringi_1.7.8     mvtnorm_1.1-1    
[97] pagedown_0.19    
choonghyunryu commented 1 year ago

@SaintRod, Thank you for your kind and specific feedback.

I'll check the first error. And the application case of na.omit() you tried is not the way I intended it to be.

the second error is This is about https://github.com/choonghyunryu/dlookr/issues/75.

Try installing the patched development version with the following command:

devtools::install_github("choonghyunryu/dlookr")

I will submit to CRAN as soon as possible.

Thanks!!!

SaintRod commented 1 year ago

Hello @choonghyunryu . Thanks for the quick response.

I've updated to the latest version on github and this seems to have resolved the second error since the code has progressed past line 250. However, now there is a new error now.

Quitting from lines 252-253 (eda_paged_temp.Rmd) 
Error in html_paged_target_numerical(reportData, targetVariable, base_family = base_family) : 
  object 'index' not found
In addition: Warning message:
'dlookr::plot_correlate' is deprecated.
Use 'plot.correlate' instead.
See help("Deprecated")
choonghyunryu commented 1 year ago

@SaintRod,

Modify the following script,

dlookr::eda_paged_report(
  .data = na.omit(df),
  target = "y",
  output_format = "html",
  output_file = "EDA.html",
  output_dir = path,
  browse = FALSE
)

as follows and run it.

library(dlookr)

eda_paged_report(
  .data = na.omit(df),
  target = "y",
  output_format = "html",
  output_file = "EDA.html",
  output_dir = path,
  browse = FALSE
)
SaintRod commented 1 year ago

Hi, @choonghyunryu:

Same error still.

Quitting from lines 252-253 (eda_paged_temp.Rmd) 
Error in html_paged_target_numerical(reportData, targetVariable, base_family = base_family) : 
  object 'index' not found
In addition: Warning message:
'dlookr::plot_correlate' is deprecated.
Use 'plot.correlate' instead.
See help("Deprecated")
choonghyunryu commented 9 months ago

@SaintRod,

I have identified the cause of the issue you raised and fixed it. I am very sorry. I resolved the issue too late.

fix html_paged_target_categorical()

Quitting from lines 258-259 [group-categorical] (eda_paged_temp.Rmd)
Error in `html_paged_target_categorical()`:
! object 'index' not found
Backtrace:
 1. dlookr:::html_paged_target_categorical(...)
There were 50 or more warnings (use warnings() to see the first 50)

fix html_paged_target_numerical()

Quitting from lines 252-253 [group-numerical] (eda_paged_temp.Rmd)
Error in `html_paged_target_numerical()`:
! object 'index' not found
Backtrace:
 1. dlookr:::html_paged_target_numerical(...)

and

Quitting from lines 252-253 [group-numerical] (eda_paged_temp.Rmd)
Error in `quantile.default()`:
! missing values and NaN's not allowed if 'na.rm' is FALSE
Backtrace:
 1. dlookr:::html_paged_target_numerical(...)
 5. dlookr:::plot.relate(fit_lm, base_family = "NanumSquare")
 6. dlookr (local) bandwidth.nrd(attr(x, "raw")[[xvar]])
      at dlookr/R/target_by.R:648:6
 8. stats:::quantile.default(x, c(0.25, 0.75))
SaintRod commented 9 months ago

Hi, @choonghyunryu. Thank you very much for working on this. I am glad you were able to resolve the issue(s).

The codebase I worked on has been refactored. Unfortunately, dlookr has been removed as a dependency. Alas, I can no longer replicate the error and test the update. I will mark this issue as closed based on your comments above. Thank you again!