Closed jaum20 closed 4 months ago
I'm having the same issue. I just came here to create a thread.
An example and minimal data set of only one family
dados <- readData(file = "0026229-230810091245214.zip",
path = "https://api.gbif.org/v1/occurrence/download/request/")
dados <- dados$occurrence
occs <- formatDwc(gbif_data = dados, drop = TRUE)
occs <- formatOcc(occs) #all good here
occs <- formatLoc(occs) #the same error mentioned by @jaum20
I tried to manually change the encoding but it got me nowhere.
version
and sessionInfo()
below:
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 3.1
year 2023
month 06
day 16
svn rev 84548
language R
version.string R version 4.3.1 (2023-06-16 ucrt)
nickname Beagle Scouts
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8 LC_CTYPE=Portuguese_Brazil.utf8 LC_MONETARY=Portuguese_Brazil.utf8
[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.utf8
time zone: America/Sao_Paulo
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.2 readr_2.1.4
[7] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.3 tidyverse_2.0.0 plantR_0.1.6
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 viridisLite_0.4.2 viridis_0.6.4 fastmap_1.1.1 lazyeval_0.2.2
[6] leaflet_2.1.2 spatialrisk_0.7.0 XML_3.99-0.14 digest_0.6.33 timechange_0.2.0
[11] lifecycle_1.0.3 sf_1.0-14 terra_1.7-39 magrittr_2.0.3 compiler_4.3.1
[16] rlang_1.1.1 tools_4.3.1 igraph_1.5.1 utf8_1.2.3 data.table_1.14.8
[21] knitr_1.43 htmlwidgets_1.6.2 bit_4.0.5 sp_2.0-0 classInt_0.4-9
[26] plyr_1.8.8 xml2_1.3.5 RColorBrewer_1.1-3 abind_1.4-5 KernSmooth_2.23-21
[31] withr_2.5.0 leafsync_0.1.0 grid_4.3.1 fansi_1.0.4 e1071_1.7-13
[36] leafem_0.2.0 colorspace_2.1-0 scales_1.2.1 dichromat_2.0-0.1 cli_3.6.1
[41] generics_0.1.3 stringdist_0.9.10 rstudioapi_0.15.0 robustbase_0.99-0 tzdb_0.4.0
[46] rgbif_3.7.7 httr_1.4.7 tmaptools_3.1-1 DBI_1.1.3 pbapply_1.7-2
[51] proxy_0.4-27 stars_0.6-3 RcppProgress_0.4.2 parallel_4.3.1 base64enc_0.1-3
[56] vctrs_0.6.3 jsonlite_1.8.7 flora_0.3.7 hms_1.1.3 bit64_4.0.5
[61] GenSA_1.1.9 crosstalk_1.2.0 units_0.8-3 leafgl_0.1.1 glue_1.6.2
[66] lwgeom_0.2-13 DEoptimR_1.1-1 codetools_0.2-19 stringi_1.7.12 countrycode_1.5.0
[71] gtable_0.3.3 raster_3.6-23 Taxonstand_2.4 munsell_0.5.0 pillar_1.9.0
[76] htmltools_0.5.6 R6_2.5.1 oai_0.4.0 lattice_0.21-8 png_0.1-8
[81] tmap_3.3-3 geohashTools_0.3.2 colourvalues_0.3.9 class_7.3-22 Rcpp_1.0.11
[86] gridExtra_2.3 whisker_0.4.1 xfun_0.40 fs_1.6.3 pkgconfig_2.0.3
The error does not existis in Windows 10, only in Linux (at least on my machine). Maybe related to this
It appears there is an issue related to the unwantedEncoding object imported within the fixLoc function. On my Windows 11 system, it is displayed as follows:
plantR:::unwantedEncoding
\xe3\xa1 \xe3\xa2 \xe3\xa3 á \xe3\xa7 \xe3\xa9 \xe3\xaa \xe3\xb4 \xe3\x8d \xe3\xba
"a" "a" "a" "a" "c" "e" "e" "o" "i" "u"
The error occurs here.
While the problem is not solved in the package, I modified the formatLoc function (now formatLoc2) and it's working.
Another workaround is change R locale from utf-8 to latin1 before run formatLoc()
Dear Jaum20,
I am facing the same error message with this function on my Mac w/ OS14. I've been looking into your suggested workaround, but am not quite sure how to do this. Should I change de encoding for the specific object, the columns used by the function, or rather for the R environment?
Hope you have further suggestions.
Many thanks,
Kasper
for the R environment?
This. You can per instance (I use brazilian portuguese):
Sys.setlocale("LC_COLLATE", "Portuguese_Brazil.1252")
Sys.setlocale("LC_CTYPE", "Portuguese_Brazil.1252")
Sys.setlocale("LC_MONETARY", "Portuguese_Brazil.1252")
Sys.setlocale("LC_TIME", "Portuguese_Brazil.1252")
After you successfully pass the formatLoc line on your script you can change it back:
Sys.setlocale("LC_COLLATE", "Portuguese_Brazil.utf8")
Sys.setlocale("LC_CTYPE", "Portuguese_Brazil.utf8")
Sys.setlocale("LC_MONETARY", "Portuguese_Brazil.utf8")
Sys.setlocale("LC_TIME", "Portuguese_Brazil.utf8")
Dear Jaum20,
Thanks for the suggestion.
Unfortunately, I seem unable to change the locale settings on my MacBook running on OS 14. E.g.
> Sys.setlocale("LC_COLLATE", "Portuguese_Brazil.1252")
[1] ""
Warning message:
In Sys.setlocale("LC_COLLATE", "Portuguese_Brazil.1252") :
OS reports request to set locale to "Portuguese_Brazil.1252" cannot be honored
I found a possible solution on StackOverflow (https://stackoverflow.com/questions/16347731/how-to-change-the-locale-of-r), where it was suggested to start an R environment with an update of the language settings as such:
LANGUAGE=Portuguese_Brazil.1252 R
This also did not change the locale settings in my R environment.
I am not sure how to continue. I used plantR before on my previous MacBook without any problems and would like to continue using it (with my very same script).
Any further suggestions are much welcome. :-)
Best wishes,
Kasper
This error you faced probabily occurs because you do no have the portuguese language installed in your system. Try using your default language and just change the encode. You can use the funcion Sys.getlocale() to see your current settings and then just change from utf8 to 1252 (latin1)
Thanks for the suggestions, again. :-) I tried your suggestion (below) and also added Portugese (Brasil) the preferred languages of my MacBook (in the general settings; not sure if that is 'enough'). None gave the result I was hoping for. ;-)
If you have yet any further options, that would be much appreciated. Sorry for all the trouble.
All the best,
Kasper
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
> Sys.setlocale("LC_COLLATE", "en_US.1252")
[1] ""
Warning message:
In Sys.setlocale("LC_COLLATE", "en_US.1252") :
OS reports request to set locale to "en_US.1252" cannot be honored
> Sys.setlocale("LC_CTYPE", "en_US.1252")
[1] ""
Warning message:
In Sys.setlocale("LC_CTYPE", "en_US.1252") :
OS reports request to set locale to "en_US.1252" cannot be honored
> Sys.setlocale("LC_MONETARY", "en_US.1252")
[1] ""
Warning message:
In Sys.setlocale("LC_MONETARY", "en_US.1252") :
OS reports request to set locale to "en_US.1252" cannot be honored
> Sys.setlocale("LC_TIME", "en_US.1252")
[1] ""
Warning message:
In Sys.setlocale("LC_TIME", "en_US.1252") :
OS reports request to set locale to "en_US.1252" cannot be honored
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Try this:
Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
From here: https://stackoverflow.com/questions/20577764/set-locale-to-system-default-utf-8
Dear all,
Thanks for the useful issue and the helpful comments and workarounds. And sorry but unfortunately I had very little time to maintain the package up to date.
R has changed the way in which it deals with the enconding of special characters and thus some of the functions need fixing.
I will look at it and try to fix the problem at its root asap.
I'm also in great need of using this package, especially because it allows data extraction from both GBIF and SpeciesLink simultaneously. I've been struggling with this for a few months now, and I would be immensely grateful if you could provide some updates to the package in general.
There was indeed and encoding issue in plantR:::unwantedEncoding
as mentioned by @wevertonbio. I think this is solved now (at least I cannot reproduce the error any more in my machine or in R CMD CHECK).
I kindly ask you to install the package from the development branch in which I pushed the new version 0.1.7 of the package. Use remotes::install_github("LimaRAF/plantR", ref = "dev")
to get this new version.
Please let me know if everything is ok before I can close this issue and merge the dev branch into the master.
Working fine now! thanks!
PlantR version: 0.1.6
Error in FUN(X[[i]], ...) : 'pattern' é inválido em UTF-8
occs.all.2.gz
O.S = Ubuntu 22