SugiharaLab / rEDM

Applications of Empirical Dynamic Modeling from Time Series
Other
117 stars 43 forks source link

R 4.2.0 Character Encoding on Windows: Can not load rEDM #61

Closed sebdart closed 2 years ago

sebdart commented 2 years ago

Hello, I am an old time rEDM user. Thank you all for your hard work and for rEDM. I'd like to report an issue with rEDM that happened on 6/6/2022 as I installed fresh new R, RStudio and rEDM on my new laptop.

Indeed, with the latest and newest version of R.4.2.0 (022-04-22, "Vigorous Calisthenics"), on Windows11, with RStudio (RStudio Desktop 2022.02.3+492), using the latest rEDM version (1.12.2), the R session systematically crashes as soons as it tries to load the rEDM library in RStudio. I did several install of rEDM: install.packages("rEDM") and devtools::install_github("SugiharaLab/rEDM")

It all leads to a complete crash ... I also changed the directory of my R libraries from \AppData\Local\R\ to \Documents\R\win-library\ but that does not help either. Only the rEDM library causes this major malfunction (among all my time series libraries).

Never had this issue before. So, I installed an older version of R (4.1.2/2021-11-01) and reinstall rEDM (1.12.2); this seems to be stable as it does not crash. It seems that rEDM is not compatible anymore with the latest R.

Thank you so much

Sebastien

SoftwareLiteracy commented 2 years ago

Thank you for the report.

Windows is a bit problematic for several reasons. One reason is development/access on Windows is not available to me, at least with WIndows 11. Limitations of Rcpp and character encodings specific to Windows add more flavor.

There are no reports of problems with the CRAN builds.

According to CRAN the windows release package was still being uploaded as of yesteray, so I would suggest to ensure you used R install.packages after the file was built/uploaded, but you report the same issue with install from github.

Another wrinkle is that rEDM on CRAN is 1.12.2, while on github, it is 1.12.2.1. There is a windows-specific change in 1.12.2.1 that is not in 1.12.2. That you get the same result with R install.packages() (version 1.12.2 from CRAN) and devtools::install_github() (version 1.12.2.1), and, that rEDM 1.12.2 works on R 4.1.2 but not 4.2.0 seems to suggest an issue beyond rEDM.

If you would like to continue diagnosing the issue, one idea is to use install.packages('rEDM', keep_outputs = TRUE, ...) to see logs of the installation.

SoftwareLiteracy commented 2 years ago

I reproduced your results on a Windows 10 machine: rEDM 1.12.2 is fine on R 4.1.3, does not load on R 4.2.0.

Here's what I found so far. According to


CHANGES IN R 4.2.0

Windows users should consult the WINDOWS section below for some profound changes including


While this goes unnoticed in the non-Windows universe (where UTF-8 locales and encodings are natural), Windows encodings are an issue. On top of that, Rcpp (an awesome package) which is the R wrapper to the cppEDM core also has to deal with the Windows peculiarities absent on other OS's. It is not clear to me where in the chain of Windows wonderland the issue arises.

I was able to get rEDM 1.12.2 to run fine on R 4.2.0 Windows 10 by resestting the locale, vis:

Sys.setlocale("LC_ALL","English_United States.1252")

Here's the full dialogue:

R version 4.2.0 (2022-04-22 ucrt) -- "Vigorous Calisthenics"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8  
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
[1] compiler_4.2.0

> Sys.setlocale("LC_ALL","English_United States.1252")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Warning message:
In Sys.setlocale("LC_ALL", "English_United States.1252") :
  using locale code page other than 65001 ("UTF-8") may cause problems

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
[1] compiler_4.2.0
>
> library(rEDM)
>
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] rEDM_1.12.2

loaded via a namespace (and not attached):
[1] compiler_4.2.0 Rcpp_1.0.8.3  

> CCM( dataFrame = Lorenz5D, E = 5, target='V5',columns='V1',libSizes=c(50,100,200,300,500,800,950), sample = 50 )
  LibSize     V1:V5     V5:V1
1      50 0.5824790 0.8212737
2     100 0.6477618 0.8668273
3     200 0.7131198 0.8864639
4     300 0.7451864 0.8938021
5     500 0.7762779 0.9050685
6     800 0.8043982 0.9130044
7     950 0.8135639 0.9158710

We notice the R version is R version 4.2.0 (2022-04-22 ucrt) where ucrt refers to the MS Universal C Run Time library, which is where I think the UTF-8 encodings might be handled.

The original locale is: English_United States.utf8, this seems to be the problem. When it is changed to English_United States.1252, then rEDM 1.12.2 runs fine on Windows 10 R 4.2.0. So at the moment, no solution, but a workaround... Ugh.

sebdart commented 2 years ago

Thank you for your hard work and fast response; I feel bad to cause you headaches. This is the reason I am only updating R a version or two behind; yes, Linux is my way to go but this time I had no-to-little choice when I found this issue with rEDM with R4.2.0. I will give a try with your workaround as soon as I am finished with my current R-session. Than you all for this awesome rEDM package and your work. Please, by all means, keep up with this good work.

SoftwareLiteracy commented 2 years ago

Thank you @sebdart!

Probing this issue with the R package developers mailing list, it appears the issue is from a gcc bug. Here is the R-pkg-devel thread.

This should provide a path forward to avoid the issue by recoding cppEDM. I'll work on it.

SoftwareLiteracy commented 2 years ago

Please note that version 1.12.3, committed to github on 2022-6-12 appears to resolve this issue.