brunobrr / bdc

Check out the vignettes with detailed documentation on each module of the bdc package
https://brunobrr.github.io/bdc
GNU General Public License v3.0
23 stars 7 forks source link

Error on bdc_coordinates_empty: invalid multibyte string #231

Closed fredtaka closed 2 years ago

fredtaka commented 2 years ago

Hello

I'm having trouble running bdc_coordinates_empty command on dataset with special characters (loaded with encoding="Latin-1" ). Apparently it is related to the command dplyr::mutate_all(as.numeric) in the code of bdc_coordinates_empty. Below is an example of this problem, including the system settings.

I thought about skipping this step or making small changes to the bdc_coordinates_empty code, but I was wondering if equivalent issues will arise later on so I'd better make a change early in the workflow.

I would appreciate any help.

> data<-fread("data.csv",encoding="Latin-1")
> data<-data[33,c(3,4,6,10)]
> data
     scientificName latitude longitude                                              locality
1: Achirus lineatus -6785496 -34955091 Área De Proteção Ambiental Da Barra Do Rio Mamanguape
> data<-bdc_coordinates_empty(data = data,lat = "latitude",lon = "longitude")
Error in `mutate()`:
! Problem while computing `locality = .Primitive("as.double")(locality)`.
Caused by error in `mask$eval_all_mutate()`:
! invalid multibyte string at '<c1>rea D<65> Prote<e7><e3>o Ambiental Da Barra Do Rio Mamanguape'
Run `rlang::last_error()` to see where the error occurred.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C               LC_TIME=pt_BR.UTF-8        LC_COLLATE=pt_BR.UTF-8    
 [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=pt_BR.UTF-8    LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rnaturalearthdata_0.2.0  rnaturalearth_0.1.0      lubridate_1.8.0          sf_1.0-8                
 [5] cowplot_1.1.1            remotes_2.4.2            forcats_0.5.1            stringr_1.4.0           
 [9] dplyr_1.0.9              purrr_0.3.4              readr_2.1.2              tidyr_1.2.0             
[13] tibble_3.1.7             ggplot2_3.3.6            tidyverse_1.3.2          vegan_2.6-2             
[17] lattice_0.20-45          permute_0.9-7            bdc_1.1.1                taxadb_0.1.5            
[21] data.table_1.14.2        CoordinateCleaner_2.0-20

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0   colorspace_2.0-3    ellipsis_0.3.2      class_7.3-20        rgdal_1.5-32        rprojroot_2.0.3    
 [7] fs_1.5.2            rstudioapi_0.13     proxy_0.4-27        DT_0.23             fansi_1.0.3         rgnparser_0.2.0    
[13] xml2_1.3.3          codetools_0.2-18    splines_4.2.1       contentid_0.0.15    jsonlite_1.8.0      broom_1.0.0        
[19] cluster_2.1.3       dbplyr_2.2.1        rgeos_0.5-9         oai_0.3.2           compiler_4.2.1      httr_1.4.3         
[25] backports_1.4.1     assertthat_0.2.1    Matrix_1.4-1        fastmap_1.1.0       lazyeval_0.2.2      gargle_1.2.0       
[31] cli_3.3.0           htmltools_0.5.3     prettyunits_1.1.1   tools_4.2.1         gtable_0.3.0        glue_1.6.2         
[37] Rcpp_1.0.9          cellranger_1.1.0    raster_3.5-21       vctrs_0.4.1         nlme_3.1-158        conditionz_0.1.0   
[43] iterators_1.0.14    rvest_1.0.2         lifecycle_1.0.1     sys_3.4             googlesheets4_1.0.0 terra_1.5-34       
[49] MASS_7.3-58         scales_1.2.0        hms_1.1.1           qs_0.25.3           curl_4.3.2          geosphere_1.5-14   
[55] stringi_1.7.8       foreach_1.5.2       e1071_1.7-11        rgbif_3.7.2         rlang_1.0.4         pkgconfig_2.0.3    
[61] htmlwidgets_1.5.4   tidyselect_1.1.2    here_1.0.1          plyr_1.8.7          magrittr_2.0.3      R6_2.5.1           
[67] generics_0.1.3      DBI_1.1.3           arkdb_0.0.15        pillar_1.8.0        haven_2.5.0         whisker_0.4        
[73] withr_2.5.0         mgcv_1.8-40         units_0.8-0         sp_1.5-0            modelr_0.1.8        crayon_1.5.1       
[79] uuid_1.1-0          KernSmooth_2.23-20  utf8_1.2.2          RApiSerialize_0.1.0 tzdb_0.3.0          progress_1.2.2     
[85] grid_4.2.1          readxl_1.4.0        reprex_2.0.1        digest_0.6.29       classInt_0.4-7      openssl_2.0.2      
[91] RcppParallel_5.1.5  munsell_0.5.0       stringfish_0.15.7   askpass_1.1    
kguidonimartins commented 2 years ago

Thanks for reporting! Indeed, it seems to be an error from mutate_all in our source code. However, I cannot reproduce your error. Check the report below:

if (!require("bdc")) install.packages("bdc")
#> Loading required package: bdc
if (!require("data.table")) install.packages("data.table")
#> Loading required package: data.table

data <-
  data.frame(
    scientificName = "Achirus lineatus",
    latitude = -6785496,
    longitude = -34955091,
    locality = "Área De Proteção Ambiental Da Barra Do Rio Mamanguape"
  ) |>
  as.data.table(encoding = "Latin-1")

bdc::bdc_coordinates_empty(data = data, lon = "longitude", lat = "latitude")
#> 
#> bdc_coordinates_empty:
#> Flagged 0 records.
#> One column was added to the database.
#>      scientificName latitude longitude
#> 1: Achirus lineatus -6785496 -34955091
#>                                                 locality .coordinates_empty
#> 1: Área De Proteção Ambiental Da Barra Do Rio Mamanguape               TRUE

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Arch Linux
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/libblas.so.3.10.1
#> LAPACK: /usr/lib/liblapack.so.3.10.1
#> 
#> locale:
#>  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2 bdc_1.1.1        
#> 
#> loaded via a namespace (and not attached):
#>   [1] fs_1.5.2                 sf_1.0-7                 oai_0.3.2               
#>   [4] progress_1.2.2           httr_1.4.3               rprojroot_2.0.3         
#>   [7] R.cache_0.15.0           rgbif_3.7.2              tools_4.2.1             
#>  [10] utf8_1.2.2               rgdal_1.5-32             R6_2.5.1                
#>  [13] DT_0.23                  KernSmooth_2.23-20       rgeos_0.5-9             
#>  [16] DBI_1.1.3                lazyeval_0.2.2           colorspace_2.0-3        
#>  [19] raster_3.5-15            withr_2.5.0              sp_1.5-0                
#>  [22] prettyunits_1.1.1        tidyselect_1.1.2         curl_4.3.2              
#>  [25] compiler_4.2.1           cli_3.3.0                xml2_1.3.3              
#>  [28] stringfish_0.15.7        scales_1.2.0             classInt_0.4-7          
#>  [31] readr_2.1.2              askpass_1.1              proxy_0.4-27            
#>  [34] stringr_1.4.0            digest_0.6.29            rmarkdown_2.14          
#>  [37] R.utils_2.11.0           contentid_0.0.15         pkgconfig_2.0.3         
#>  [40] htmltools_0.5.2          styler_1.7.0             dbplyr_2.2.0            
#>  [43] fastmap_1.1.0            highr_0.9                htmlwidgets_1.5.4       
#>  [46] rlang_1.0.3              generics_0.1.2           RApiSerialize_0.1.0     
#>  [49] jsonlite_1.8.0           dplyr_1.0.9              R.oo_1.25.0             
#>  [52] magrittr_2.0.3           geosphere_1.5-14         Rcpp_1.0.8.3            
#>  [55] munsell_0.5.0            fansi_1.0.3              CoordinateCleaner_2.0-20
#>  [58] rgnparser_0.2.5.91       lifecycle_1.0.1          R.methodsS3_1.8.2       
#>  [61] terra_1.5-34             stringi_1.7.6            whisker_0.4             
#>  [64] yaml_2.3.5               plyr_1.8.7               grid_4.2.1              
#>  [67] parallel_4.2.1           crayon_1.5.1             lattice_0.20-45         
#>  [70] conditionz_0.1.0         hms_1.1.1                sys_3.4                 
#>  [73] knitr_1.39               pillar_1.7.0             uuid_1.1-0              
#>  [76] codetools_0.2-18         reprex_2.0.1             glue_1.6.2              
#>  [79] evaluate_0.15            arkdb_0.0.15             RcppParallel_5.1.5      
#>  [82] vctrs_0.4.1              tzdb_0.3.0               foreach_1.5.2           
#>  [85] openssl_2.0.2            gtable_0.3.0             purrr_0.3.4             
#>  [88] qs_0.25.3                assertthat_0.2.1         ggplot2_3.3.6           
#>  [91] xfun_0.31                e1071_1.7-11             rnaturalearth_0.1.0     
#>  [94] taxadb_0.1.5             class_7.3-20             tibble_3.1.7            
#>  [97] iterators_1.0.14         units_0.8-0              ellipsis_0.3.2          
#> [100] here_1.0.1

Created on 2022-07-22 by the reprex package (v2.0.1)

Could you please try another method to read your data? Maybe there is an inconsistency for the data.table object that we didn't prevent before. Perhaps using readr::read_csv() or even utils::read.csv() will solve the problem. If the error persists, please report it again. We would be happy to help resolve this issue.

kguidonimartins commented 2 years ago

Hi again @fredtaka, forget the previous message. Could you try the latest version of bdc? You can get it with:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("brunobrr/bdc", force = TRUE)

Then, proceed with your query again:

data <-
  data.frame(
    scientificName = "Achirus lineatus",
    latitude = -6785496,
    longitude = -34955091,
    locality = "Área De Proteção Ambiental Da Barra Do Rio Mamanguape"
    )

bdc::bdc_coordinates_empty(data = data, lon = "longitude", lat = "latitude")
fredtaka commented 2 years ago

Hi @kguidonimartins

Thanks for the quick response and the code adaptation. I tested it now with the new version of bdc on GitHub and it ran without problems.

kguidonimartins commented 2 years ago

Great! The new version of bdc will be available on CRAN in the next few weeks. I'm close this issue for now.