Closed cjyetman closed 9 months ago
Cannot Replicate.
txt <- '"Über" text'
txt
#> [1] "\"Über\" text"
win_txt <- iconv(txt, to = "ISO-8859-1")
win_txt
#> [1] "\"\xdcber\" text"
iconv(win_txt, to = "UTF-8")
#> [1] NA
# This does not crash my system.
iconv(win_txt, to = "UTF-8", sub = "")
#> [1] "\"ber\" text"
Created on 2023-11-28 with reprex v2.0.2
weird... posting my session info for comparison
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16)
#> os macOS Sonoma 14.1.1
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2023-11-29
#> pandoc 3.1.8 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0)
#> evaluate 0.23 2023-11-01 [1] RSPM
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> htmltools 0.5.7 2023-11-03 [1] RSPM (R 4.3.0)
#> knitr 1.45 2023-10-30 [1] RSPM (R 4.3.0)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0)
#> R.oo 1.25.0 2022-06-12 [2] CRAN (R 4.3.0)
#> R.utils 2.12.3 2023-11-18 [2] RSPM (R 4.3.0)
#> reprex 2.0.2 2022-08-17 [2] CRAN (R 4.3.0)
#> rlang 1.1.2 2023-11-04 [1] RSPM (R 4.3.0)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> styler 1.10.2 2023-08-29 [2] RSPM (R 4.3.1)
#> vctrs 0.6.4 2023-10-12 [1] RSPM (R 4.3.0)
#> withr 2.5.2 2023-10-30 [1] RSPM (R 4.3.0)
#> xfun 0.41 2023-11-01 [1] RSPM
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] /Users/cjrmi/Library/R/arm64/4.3/library
#> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
seems like the most significant, relevant difference is..
#> os macOS Ventura 13.5.2
#> system aarch64, darwin22.4.0
versus
#> os macOS Sonoma 14.1.1
#> system aarch64, darwin20
curious what will happen once you update
Hmm. Yeah. this looks like it might be a Sonoma issue. https://groups.google.com/g/r-sig-mac/c/pFuJu7F33Eg/m/xdrBmg-PDwAJ
We could (temporarily?) put a testthat::skip_on_os(os = "mac")
in that specific test so I can test locally until this is resolved?
I would suggest making it perhaps a bit more specific than that, and use something like skip_if(Sys.info()$sysname == "Darwin" && Sys.info()$release == "23.2.0")
I'm tempted now not to skip the test, because the function has_newline_at_end()
will still hard crash R on >= macOS 14, which is the real problem here.
I've upgraded to Sonoma, but it looks like R 4.3.2 clears this issue up:
txt <- '"Über" text'
txt
#> [1] "\"Über\" text"
win_txt <- iconv(txt, to = "ISO-8859-1")
win_txt
#> [1] "\"\xdcber\" text"
iconv(win_txt, to = "UTF-8")
#> [1] NA
# This does not crash my system.
iconv(win_txt, to = "UTF-8", sub = "")
#> [1] "\"ber\" text"
Created on 2023-12-20 with reprex v2.0.2
Likewise for me. That's good news, but... raising the bar to >= R v4.3.2 might be a bit extreme.
Yeah. Probably a bit extreme. Maybe we put a check in .onLoad
and emit a warning if R < 4.3.2 on Sonoma or higher?
I think ideally I should find a more robust way of testing for a newline at the end of the file... maybe using system's wc -l
or something?
Hmm. I put #57 in to resolve the particular issue, but if we're only using it to check for newline chars, then yeah, I'm thinking that this might be me building a better mousetrap
The entire point of this function is to check if there's a trailing newline at the end of the CSV?
The entire point of this function is to check if there's a trailing newline at the end of the CSV?
Yes, because there was a time when read.csv()
, readr::read_csv()
, and/or vroom::vroom()
could not read in a CSV file that did not end with a newline without a crash or failure.
This test https://github.com/RMI-PACTA/pacta.portfolio.import/blob/13beda3b52afc6e53ff644b1d454001c596dd79b/tests/testthat/test-read_portfolio_csv.R#L121-L138
causes a hard error and crash of the R session with
I tracked down the root of this to this line https://github.com/RMI-PACTA/pacta.portfolio.import/blob/13beda3b52afc6e53ff644b1d454001c596dd79b/R/has_newline_at_end.R#L26
which intends to convert the text into UTF-8, and if any character is unconvertible replace it with an empty string
That line was added in https://github.com/RMI-PACTA/pacta.portfolio.import/pull/49 to avoid an error with
grepl
not receiving valid UTF-8 textThis appears to affect my machine (macOS 14.1.1 / arm64), but not a RStudio Cloud instance (some linux flavor), or our GitHub Action runners. There seems to be a bug in R base's
iconv()
or the underlying system libraries that it may or may not use, that causes a crash of the R session if in the string being processed a quotation mark is followed by an extended character...