easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
965 stars 87 forks source link

Error in performance::check_distribution(): in call bw.SJ() #696

Closed arodionoff closed 3 months ago

arodionoff commented 3 months ago

Since spring, using the performance::check_distribution() function gives an error in logistic regression:

Error in bw.SJ(x, method = "ste") : sample is too sparse to find TD

# install.packages(c("smbinning", "randomForest", "performance"))
# Load library and its dataset
library(smbinning)
# Sampling
pop=smbsimdf1 # Population
train=subset(pop,rnd<=0.7) # Training sample
# Generate binning object to generate variables
smbcbs1=smbinning(train,x="cbs1",y="fgood")
smbcbinq=smbinning.factor(train,x="cbinq",y="fgood")
pop=smbinning.gen(pop,smbcbs1,"g1cbs1")
pop=smbinning.factor.gen(pop,smbcbinq,"g1cbinq")
# Resample
train=subset(pop,rnd<=0.7) # Training sample
test=subset(pop,rnd>0.7) # Testing sample
# Run logistic regression

modlogisticsmb=glm(fgood~ .,data = train,family = binomial())
summary(modlogisticsmb)

# Error in performance::check_distribution()
library(performance)
performance::check_distribution(modlogisticsmb)

We has error:

#> Error in bw.SJ(x, method = "ste") :sample is too sparse to find TD

However the same code in the environment works:

> utils::sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] performance_0.10.8 smbinning_0.9      Formula_1.2-5      partykit_1.2-20    mvtnorm_1.2-3      libcoin_1.0-10    
 [7] sqldf_0.4-11       RSQLite_2.3.2      gsubfn_0.7         proto_1.0.0       

loaded via a namespace (and not attached):
 [1] rstudioapi_0.15.0    splines_4.2.2        insight_0.19.6       bit_4.0.5            lattice_0.20-45     
 [6] rlang_1.1.1          fastmap_1.1.1        blob_1.2.4           tcltk_4.2.2          tools_4.2.2         
[11] cli_3.6.1            DBI_1.1.3            bayestestR_0.13.1    datawizard_0.9.0     randomForest_4.7-1.1
[16] survival_3.4-0       bit64_4.0.5          inum_1.0-5           Matrix_1.6-1.1       vctrs_0.6.4         
[21] rpart_4.1.21         memoise_2.0.1        cachem_1.0.8         compiler_4.2.2       chron_2.3-61        
[26] pkgconfig_2.0.3 

> performance::check_distribution(modlogisticsmb)
# Distribution of Model Family

Predicted Distribution of Residuals

         Distribution Probability
               normal         62%
               cauchy         34%
 poisson (zero-infl.)          3%

Predicted Distribution of Response

 Distribution Probability
    bernoulli         97%
     binomial          3%
arodionoff commented 3 months ago

You can restore the function performance::check_distribution() by downloading previous old versions of 4 packages:

bayestestR - 0.13.1, datawizard - 0.9.0, insight - 0.19.6, performance - 0.10.8

strengejacke commented 3 months ago

Thanks, should be fixed (and included in #643)