easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
561 stars 55 forks source link

bayesfactor: make.names error " invalid multibyte string 36" #417

Closed morgan-sparks closed 3 years ago

morgan-sparks commented 3 years ago

Describe the bug I am trying to compute a bayesfactor for intercept only (random effects) meta-analytical model I have run in brms. I saved the model as a RDS after running it on our compute cluster, load it into my local R environment with readRDS() and then attempt to use bayesfactor() or similar functions to compute statistics for the model.

Just to caveat, this is my first time using the package so I expect this is fully operator error, but I can't seem to figure out the issue. Looks like a great package!

To Reproduce Heres the code:

link to large (~270MB) RDS file, a brms object https://www.dropbox.com/s/7tmr6xf7syqya87/mod_norm_logtrans_trait_2randeff.rds?dl=0

library(bayestestR)

int_mod <- readRDS("path2mod") #took out my personal path to model, attaching link to RDS file
int_mod # look at mod

bayesfactor_parameters(int_mod, null = c(0,0.5))

bayesfactor(int_mod)

describe_posterior(int_mod)

Expected behaviour Obviously I expect to get the normal output from those function, the error I receive from all of them is Error in make.names(vnames, unique = TRUE) : invalid multibyte string 36

Specifiations (please complete the following information): Here is the session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bayestestR_0.9.0

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-6    jsonlite_1.7.1       gtools_3.8.2         StanHeaders_2.21.0-6 RcppParallel_5.0.2  
 [6] threejs_0.3.3        shiny_1.5.0          assertthat_0.2.1     stats4_3.6.3         yaml_2.2.1          
[11] backports_1.1.9      pillar_1.4.6         lattice_0.20-38      glue_1.4.2           digest_0.6.25       
[16] promises_1.1.1       colorspace_1.4-1     htmltools_0.5.0      httpuv_1.5.4         Matrix_1.2-18       
[21] plyr_1.8.6           dygraphs_1.1.1.6     pkgconfig_2.0.3      rstan_2.21.2         purrr_0.3.4         
[26] xtable_1.8-4         mvtnorm_1.1-1        scales_1.1.1         processx_3.4.4       later_1.1.0.1       
[31] tibble_3.0.3         bayesplot_1.7.2      generics_0.0.2       ggplot2_3.3.2        ellipsis_0.3.1      
[36] DT_0.15              withr_2.2.0          shinyjs_1.1          cli_2.0.2            magrittr_1.5        
[41] crayon_1.3.4         mime_0.9             ps_1.3.4             fansi_0.4.1          nlme_3.1-144        
[46] xts_0.12-0           pkgbuild_1.1.0       colourpicker_1.0     prettyunits_1.1.1    rsconnect_0.8.16    
[51] tools_3.6.3          loo_2.3.1            lifecycle_0.2.0      matrixStats_0.56.0   stringr_1.4.0       
[56] V8_3.2.0             munsell_0.5.0        callr_3.4.4          compiler_3.6.3       rlang_0.4.7         
[61] grid_3.6.3           ggridges_0.5.2       rstudioapi_0.11      htmlwidgets_1.5.1    crosstalk_1.1.0.1   
[66] igraph_1.2.5         miniUI_0.1.1.1       base64enc_0.1-3      codetools_0.2-16     gtable_0.3.0        
[71] curl_4.3             inline_0.3.16        abind_1.4-5          markdown_1.1         reshape2_1.4.4      
[76] R6_2.4.1             gridExtra_2.3        rstantools_2.1.1     zoo_1.8-8            bridgesampling_1.0-0
[81] dplyr_1.0.2          fastmap_1.0.1        brms_2.14.0          shinystan_2.5.0      shinythemes_1.1.2   
[86] insight_0.13.2.1     stringi_1.5.3        parallel_3.6.3       Rcpp_1.0.5           vctrs_0.3.4         
[91] tidyselect_1.1.0     coda_0.19-3 
mattansb commented 3 years ago

Cannot reproduce:

library(bayestestR)

int_mod <- readRDS(choose.files()) #took out my personal path to model, attaching link to RDS file

describe_posterior(int_mod)
#> Summary of Posterior Distribution 
#> 
#> Parameter   | Median |        95% CI |     pd |          ROPE | % in ROPE |  Rhat |      ESS
#> --------------------------------------------------------------------------------------------
#> (Intercept) |  -0.08 | [-0.77, 0.59] | 61.15% | [-0.10, 0.10] |    28.90% | 1.000 | 59822.00
#> Warning messages:
#> 1: Warning: Following potential variables could not be found in the data: grSpecies, cov = vcv_mat 
#> 2: Could not estimate a good default ROPE range. Using 'c(-0.1, 0.1)'. 

Can you try these two lines and report what you get?

samps <- insight::get_parameters(int_mod)
describe_posterior(samps)
morgan-sparks commented 3 years ago

So when I run the insight:: chunk I get another, similar error (it was actually the exact error I was getting until I downloaded the developer version of bayestestR, now I get string 36 instead of 21). Obivously, with the error no object gets created to run the describe_posterior() function.

samps <- insight::get_parameters(int_mod)
Error in make.names(vnames, unique = TRUE) : invalid multibyte string 21

I am assuming this issue must be something local on my machine?

bwiernik commented 3 years ago

Can you try this:

  1. Recode the Paper.Name column to be integers with as.integer(as.factor(Paper.Name)) or similar.
  2. Refit the model with the recoded Paper.Name.
  3. See if you get the same error.
bwiernik commented 3 years ago

And if that does work, can you send the data file (just the Paper.Name column is fine) you are importing, as well as your script for importing it and a printout of the Paper.Name data.frame after you import it?

morgan-sparks commented 3 years ago

@bwiernik I am assuming this may be an issue with the names of the papers (some have weird symbols). Since my models take a silly long time on the cluster (days), would a reduced complexity model without nested effects, much shorter chains, etc. serve the same purpose?

morgan-sparks commented 3 years ago

@bwiernik I am assuming this may be an issue with the names of the papers (some have weird symbols). Since my models take a silly long time on the cluster (days), would a reduced complexity model without nested effects, much shorter chains, etc. serve the same purpose?

@bwiernik I ran a much simplified model with this fit log(temp.mn) | se(std_err) ~ 1 + (1|paper_ints), note the paper_ints in the random effect like you recommended. All of bayesfactor(), bayesfactor_parameters(), and describe_posterior() seem to be working. See below, model has no convergence so basically meaningless other than as a test for the issue.

> describe_posterior(issue_mod)
Summary of Posterior Distribution 

Parameter   | Median |        95% CI |     pd |          ROPE | % in ROPE |  Rhat |  ESS
----------------------------------------------------------------------------------------
(Intercept) |  -0.35 | [-0.66, 0.80] | 70.72% | [-0.10, 0.10] |     3.33% | 4.639 | 2.00
Warning message:
Could not estimate a good default ROPE range. Using 'c(-0.1, 0.1)'. 
> 
> bayesfactor(issue_mod, null = log(c(0, .1)))
Sampling priors, please wait...
Bayes Factor (Null-Interval) 

Parameter   |     BF
--------------------
(Intercept) | > 1000

* Evidence Against The Null: [-Inf, -2.303]> 
> bayesfactor_parameters(issue_mod, null = log(c(0,0.5)))
Sampling priors, please wait...
Bayes Factor (Null-Interval) 

Parameter   |    BF
-------------------
(Intercept) | 12.06

* Evidence Against The Null: [-Inf, -0.693]
bwiernik commented 3 years ago

Okay, yeah, that works for this purpose. Looks like an issue with the text in the long paper names. Can you do this part?

And if that does work, can you send the data file (just the Paper.Name column is fine) you are importing, as well as your script for importing it and a printout of the Paper.Name data.frame after you import it?

morgan-sparks commented 3 years ago

Got it. Attached is the original column in a .csv and here is the read in and print out script. I also pasted r printed names into the second column of the .csv file: issue_paper.names.csv

trait_dat <-  read.csv(path = dir, "trait_level_data.csv")

trait_dat$Paper.Name

[1] Adaptation to ice-cover conditions in Atlantic salmon, Salmo salar L.                                                                                                               
 [2] Adaptation to ice-cover conditions in Atlantic salmon, Salmo salar L.                                                                                                               
 [3] Adaptive variation in energy acquisition and allocation among latitudinal populations of the Atlantic silverside                                                                    
 [4] Antipredator defenses along a latitudinal gradient in Rana temporaria                                                                                                               
 [5] Antipredator defenses along a latitudinal gradient in Rana temporaria                                                                                                               
 [6] CONVERGENT EVOLUTION OF EMBRYONIC GROWTH AND DEVELOPMENT IN THE EASTERN FENCE LIZARD (SCELOPORUS UNDULATUS)                                                                         
 [7] CONVERGENT EVOLUTION OF EMBRYONIC GROWTH AND DEVELOPMENT IN THE EASTERN FENCE LIZARD (SCELOPORUS UNDULATUS)                                                                         
 [8] Converse Bergmann cline in a Eucalyptusherbivore, Paropsis atomaria Olivier (Coleoptera: Chrysomelidae): phenotypic plasticity or local adaptation?                                 
 [9] Countergradient Variation in Growth Among Newly Hatched Fundulus Heteroclitus: Geographic Differences Revealed by Common-Environment Experiments                                    
[10] Countergradient Variation in Growth Among Newly Hatched Fundulus Heteroclitus: Geographic Differences Revealed by Common-Environment Experiments                                    
[11] Countergradient Variation in Growth Among Newly Hatched Fundulus Heteroclitus: Geographic Differences Revealed by Common-Environment Experiments                                    
[12] Countergradient variation in growth and food conversion efficiency of juvenile turbot                                                                                               
[13] Countergradient variation in growth and food conversion efficiency of juvenile turbot                                                                                               
[14] Countergradient variation in growth rate: compensation for length of the growing season among Atlantic silversides from different latitudes                                         
[15] Countergradient vs. cogradient variation in growth and diapause in a lichen?feeding moth, Eilema depressum (Lepidoptera: Arctiidae)                                                 
[16] Effect of Temperature and Salinity on Growth Performance in Anadromous (Chesapeake Bay) and Nonanadromous (Santee-Cooper) Strains of Striped Bass Morone saxatilis                  
[17] Evidence of countergradient variation in the growth of an intertidal snail in response to water velocity                                                                            
[18] Evidence of countergradient variation in the growth of an intertidal snail in response to water velocity                                                                            
[19] Evidence of countergradient variation in the growth of an intertidal snail in response to water velocity                                                                            
[20] Evidence of countergradient variation in the growth of an intertidal snail in response to water velocity                                                                            
[21] Explaining variation in life-history traits: growth rate, size, and fecundity in a marine snail across an environmental gradient lacking predators                                  
[22] Genetic and environmental components of phenotypic variation in body shape among populations of Atlantic cod (Gadus morhua L.)                                                      
[23] Geographic variation in development rate between populations of the teleost Fundulus heteroclitus                                                                                   
[24] Geographic variation in growth and food conversion efficiency of juvenile Atlantic halibut related to latitude                                                                      
[25] Geographic variation in growth and food conversion efficiency of juvenile Atlantic halibut related to latitude                                                                      
[26] GEOGRAPHIC VARIATION IN LIFE?HISTORY TRAITS OF THE ANT LION,\xa0MYRMELEON IMMACULATUS: EVOLUTIONARY IMPLICATIONS OF BERGMANN'S RULE                                                 
[27] GEOGRAPHIC VARIATION IN LIFE?HISTORY TRAITS OF THE ANT LION,\xa0MYRMELEON IMMACULATUS: EVOLUTIONARY IMPLICATIONS OF BERGMANN'S RULE                                                 
[28] GEOGRAPHIC VARIATION IN LIFE?HISTORY TRAITS OF THE ANT LION,\xa0MYRMELEON IMMACULATUS: EVOLUTIONARY IMPLICATIONS OF BERGMANN'S RULE                                                 
[29] Geographic Variation of Larval Growth in North American Aedes albopictus                                                                                                            
[30] Geographic Variation of Larval Growth in North American Aedes albopictus                                                                                                            
[31] Geographic Variation of Larval Growth in North American Aedes albopictus                                                                                                            
[32] Integrating Genetic and Environmental Forces that Shape the Evolution of Geographic Variation in a Marine Snail                                                                     
[33] Inter?and intrapopulation variation in thermal reaction norms for growth rate: evolution of latitudinal compensation in ectotherms with a genetic constraint                        
[34] Interpopulation differences in growth rates and food conversion efficiencies of young Grand Banks and Gulf of Maine Atlantic cod (Gadus morhua)                                     
[35] INTRA? VS. INTERSPECIFIC LATITUDINAL VARIATION IN GROWTH: ADAPTATION TO TEMPERATURE OR SEASONALITY?                                                                                 
[36] INTRA? VS. INTERSPECIFIC LATITUDINAL VARIATION IN GROWTH: ADAPTATION TO TEMPERATURE OR SEASONALITY?                                                                                 
[37] Intraspecific Differences in Physiological Efficiency of Juvenile Atlantic Halibut\xa0Hippoglossus hippoglossus                                                                     
[38] Intraspecific Differences in Physiological Efficiency of Juvenile Atlantic Halibut\xa0Hippoglossus hippoglossus                                                                     
[39] Intraspecific Differences in Physiological Efficiency of Juvenile Atlantic Halibut\xa0Hippoglossus hippoglossus                                                                     
[40] Larval tolerance, gene flow, and the northern geographic range limit of fiddler crabs                                                                                               
[41] Latitudinal and temperature-dependent variation in embryonic development rate and offspring performance in a freshwater turtle                                                      
[42] Latitudinal and voltinism compensation shape thermal reaction norms for growth rates                                                                                                
[43] Latitudinal compensation in female reproductive rate of a geographically widespread reef fish                                                                                       
[44] Latitudinal compensation in oyster ciliary activity                                                                                                                                 
[45] Latitudinal countergradient variation in the common frog(Rana temporaria) development rates \x96 evidence for localadaptation                                                       
[46] Life history traits associated with body size covary along a latitudinal gradient in a generalist grasshopper                                                                       
[47] Life history traits associated with body size covary along a latitudinal gradient in a generalist grasshopper                                                                       
[48] Life history traits associated with body size covary along a latitudinal gradient in a generalist grasshopper                                                                       
[49] Life-history difference in adjacent water strider populations: phenotypic plasticity or heritable responses to stream temperature?                                                  
[50] Little plant, big city: a test of adaptation to urban environments in common ragweed (Ambrosia artemisiifolia)                                                                      
[51] Microgeographic differentiation in thermal performance curves between rural and urban populations of an aquatic insect                                                              
[52] MIGRATORY COSTS AND THE EVOLUTION OF EGG SIZE AND NUMBER IN INTRODUCED AND INDIGENOUS SALMON POPULATIONS                                                                            
[53] PHENOTYPIC CLINES, PLASTICITY, AND MORPHOLOGICAL TRADE?OFFS IN AN INTERTIDAL SNAIL                                                                                                  
[54] PHENOTYPIC CLINES, PLASTICITY, AND MORPHOLOGICAL TRADE?OFFS IN AN INTERTIDAL SNAIL                                                                                                  
[55] Photosynthesis, respiration, and phosphate absorption by Carex aquatilis ecotypes along latitudinal and local environmental gradients                                               
[56] Phsyiological variation along a geographical gradient: is growth rate correlated with routine metabolic rate in Rana temporaria tadpoles                                            
[57] Post?glacial colonization routes coincide with a life?history breakpoint along a latitudinal gradient                                                                               
[58] Post?glacial colonization routes coincide with a life?history breakpoint along a latitudinal gradient                                                                               
[59] Potential latitudinal variation in egg size and number of a geographically widespread reef fish, revealed by common-environment experiments                                         
[60] Potential latitudinal variation in egg size and number of a geographically widespread reef fish, revealed by common-environment experiments                                         
[61] Proximate causes of adaptive growth rates: growth efficiency variation among latitudinal populations of Rana temporaria                                                             
[62] Proximate causes of adaptive growth rates: growth efficiency variation among latitudinal populations of Rana temporaria                                                             
[63] Quantitative Genetic Analysis of Larval Life History Traits in Two Alpine Populations of Rana temporaria                                                                            
[64] Quantitative genetic approach for assessing invasiveness: geographic and genetic variation in life-history traits                                                                   
[65] Reaction norms for age and size at maturity in Lasiommata butterflies: predictions and tests                                                                                        
[66] Reaction norms for age and size at maturity in Lasiommata butterflies: predictions and tests                                                                                        
[67] Stock?specific changes in growth rates, food conversion efficiencies, and energy allocation in response to temperature change in juvenile Atlantic cod                              
[68] STUDIES ON THE PHYSIOLOGICAL VARIATION BETWEEN TROPICAL AND TEMPERATE ZONE FIDDLER CRABS OF THE GENUS UCA. II. OXYGEN CONSUMPTION OF WHOLE ORGANISMS                                
[69] Studies on the Physiological Variation between Tropical and Temperate-Zone Fiddler Crabs of the Genus Uca. IV. Oxygen Consumption of Larvae and Young Crabs Reared in the Laboratory
[70] Temperature and clinal variation in larval growth efficiency in\xa0Drosophila melanogaster                                                                                          
[71] The physiological basis of geographic variation in rates of embryonic development within a widespread lizard species                                                                
[72] Variation in digestive performance between geographically disjunct populations of Atlantic salmon: countergradient in passage time and digestion rate                               
[73] Variation in food intake, food conversion efficiency and growth of juvenile turbot from different geographic strains                                                                
[74] Variation in food intake, food conversion efficiency and growth of juvenile turbot from different geographic strains                                                                
[75] Variation in Larval Growth Rate among Striped Bass Stocks from Different Latitudes                                                                                                  
[76] Variation in Larval Growth Rate among Striped Bass Stocks from Different Latitudes
bwiernik commented 3 years ago

Okay, yeah, you've got some non-breaking spaces encoded in Latin-1 there (\xa0), even though the file was read in as UTF-8; those are tripping R up (R is pretty bad at handling text encoding issues that are messy.

You can fix this by changing your input fileEncoding: trait_dat <- read.csv(path = dir, "trait_level_data.csv", fileEncoding = "latin1")

or by searching and replacing the non-breaking space character: trait_dat$Paper.Name <- stringr::str_replace(trait_dat$Paper.Name, "\xa0", " ")

This is some pretty deep R stuff, so not really anything that can be done on the bayestestR side.

morgan-sparks commented 3 years ago

@bwiernik Thanks very much for the help, I really appreciate it! Your factor to integer solution is a better option for my stuff anyway!

bwiernik commented 3 years ago

The approach I take with meta-analyses is to put the BibTeX/pandoc citation key for the paper from my Zotero library in my data sheet. That way, if I drop any cases in the analysis, I can use the reduced column to generate a bibliography.

morgan-sparks commented 3 years ago

Ah, smart. This is my first one and I am learning the many ways I could have made this more efficient. But that's part of the process I suppose.