gesistsa / rio

🐟 A Swiss-Army Knife for Data I/O
http://gesistsa.github.io/rio/
599 stars 76 forks source link

pzfx errors on non-numeric columns #235

Closed bokov closed 4 years ago

bokov commented 4 years ago

Related issue: #211 Please specify whether your issue is about:

If you are reporting (1) a bug or (2) a question about code, please supply:

Put your code here:

## load package
library("rio")
library("pzfx")
## code goes here
export(iris,'iris.pzfx')
# Error in pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names) : 
#  These tables are not all numeric: Data 1

session info for your system

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.6 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pzfx_0.2.0    readODS_1.6.7 rio_0.5.20   

loaded via a namespace (and not attached):
 [1] zip_2.0.4           Rcpp_1.0.2          cellranger_1.1.0   
 [4] pillar_1.4.2        compiler_3.4.4      remotes_2.1.0      
 [7] forcats_0.3.0       prettyunits_1.0.2   tools_3.4.4        
[10] testthat_2.2.1      digest_0.6.21       pkgbuild_1.0.5     
[13] pkgload_1.0.2       memoise_1.1.0       tibble_2.1.3       
[16] pkgconfig_2.0.3     rlang_0.4.0         openxlsx_4.1.0.1   
[19] cli_1.1.0           curl_4.2            haven_2.1.1        
[22] withr_2.1.2         xml2_1.2.2          fs_1.3.1           
[25] desc_1.2.0          devtools_2.2.1.9000 hms_0.4.2          
[28] rprojroot_1.3-2     glue_1.3.1          data.table_1.12.4  
[31] R6_2.4.0            processx_3.4.1      readxl_1.3.1       
[34] foreign_0.8-70      sessioninfo_1.1.1   readr_1.3.1        
[37] callr_3.3.2         magrittr_1.5        usethis_1.5.1      
[40] backports_1.1.2     ps_1.3.0            ellipsis_0.3.0     
[43] assertthat_0.2.1    stringi_1.4.3       crayon_1.3.4       

traceback

5: stop(sprintf("These tables are not all numeric: %s", paste(names(x_lst)[!are_nums], 
       collapse = ", ")))
4: pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names)
3: .export.rio_pzfx(file = file, x = x, ...)
2: .export(file = file, x = x, ...)
1: export(iris, "iris.pzfx")

possible workaround

Caveat: I am unfamiliar with the GraphPad format and use-case, so I don't know whether preventing errors by coercing characters and factors to numeric would be helpful or in fact make matters worse by hiding invalid data. Also, I have not tried this with content other than character and factor and have no idea what will happen. Thoughts, @billdenney ?

.export.rio_pzfx <- function (file, x, ..., row_names = FALSE) 
{
    requireNamespace("pzfx")
   # new code here
   for (ii in names(x)) {
     if (inherits(x[[ii]], c('character', 'factor'))) {
       x[[ii]] <- as.numeric(factor(x[[ii]]))
       warning('Column ',ii,' converted to numeric')
   }
   # end new code
    pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names)
}
billdenney commented 4 years ago

The format is very limited. I have only used it with numeric data, and I don’t think that it can handle anything else (but I don’t know).

I think that this style of data modification is a better fit in the underlying library, but generally this kind of character to integer conversion is a non-obvious transformation in many cases, so I don’t prefer its automation. That said, the decision is of course @leeper’s.

leeper commented 4 years ago

Agree - I don't think we want to do automatic conversion here since it's lossy.

Yue-Jiang commented 4 years ago

Thanks for pointing me to this issue @billdenney. As far as I know prism doesn't really allow factor or characters in the "main" part of the table (except in certain table types you are allowed for a single X column that can be character, e.g. contingency table). In other words, there is no straightforward way of fitting iris into a Graphpad Prism table, unless we are informed about the hierarchy of columns so we can do reshape-like operations on the input data to force it to fit into prism. So raising the error is the intended behavior here.

bokov commented 4 years ago

Withdrawn, I guess?