As a dev, I want to add a way to deal with \uxxxx issue

VincentGuyader commented 3 years ago

detect and explicit the correction to do.

statnmap commented 3 years ago

Maybe this issue is for {checkhelper} finally: https://github.com/ThinkR-open/thinkr/issues/13

statnmap commented 3 years ago

Find a proper way to add this as a function.
Set a parameter to define if this should be transformed as hex so that there are accents in the documentation and functions, or to letters without accents so that it is readable in the code directly.
Usually if in #' this should hex and in simple comments, transform without accent.

Maybe stringi::stri_trans_general(char, "latin") can help detect special characters by comparing before / after its use.

#' Clean non-ASCII character
#' TODO Add to {thinkr}
#'
#' Add an option to transform as ? if they want
chars <- c(
  "à", "â",
  "é", "è", "ê",
  "î", "ï",
  "ô", "ö", "ø",
  "æ", "œ",
  "ù",
  "ç",
  "’", "²"
)

tempfile1 <- tempfile(fileext = ".txt")
file.copy(system.file("test_files/test_file.txt", package = "thinkr"), tempfile1)

clean_ascii_dir <- function(path, pattern = ".") {

  list.files(path, full.names = TRUE, pattern = pattern) %>%
    purrr::walk(clean_ascii_file)

}

# clean_ascii_file(tempfile1)

clean_ascii_file <- function(path) {

  path <- tempfile1
  # path <- paths[23]
  lines <- readr::read_lines(path)

  # Test if non-ascii characters
  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0) {

    for (char in chars) {
      lines[ind_rox] <- stringi::stri_replace_all_coll(
        lines[ind_rox],
        char,
        # paste0("\\", stringi::stri_trans_general(char, "hex"))
        paste0("\\", stringi::stri_trans_general(char, "Latin-ASCII"))
      )
    }

  }
  if  (length(ind_no_rox) != 0) {

    for (char in chars) {
      lines[ind_no_rox] <- stringi::stri_replace_all_coll(
        lines[ind_no_rox],
        char,
        stringi::stri_trans_general(char, "hex")
      )
    }
  }

  if (length(c(ind_rox, ind_no_rox)) != 0) {
    readr::write_lines(lines, path)
  }

  asc <- iconv(lines, "latin1", "ASCII")
  ind_rox <- which((is.na(asc) | asc != lines) & grepl("^#'", lines))
  ind_no_rox <- which((is.na(asc) | asc != lines) & !grepl("^#'", lines))

  if (length(ind_rox) != 0 | length(ind_no_rox) != 0) {
    warning("Some character of file '", path, "' have not been converted in lines:", paste(ind_rox, ind_no_rox))
  }

  cat(crayon::green(path, "should be clean"))

}

statnmap commented 3 years ago

With test files

#' Random file with non-ascii characters
#' Des caratères spéciaux aussi dans le roxygen

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "à", "â"
- "é", "è", "ê"
- "î", "ï"
- "ô", "ö", "ø"
#' "à", "â" # for roxygen, it is different
#' "é", "è", "ê" # for roxygen, it is different

And

#' A second random file with non-ascii characters

Ce texte peut-être considéré comme un texte qui ne passe pas les tests du CRAN.
En il contient des caractères de type non-ascii avec des accents tels que :

- "æ", "œ"
- "ù"
- "ç"
- "’", "²"

statnmap commented 2 years ago

And in the text that is in classical R comments, we can transform text without characters with stringi::stri_trans_general(char, "latin")

So that

for roxygen2 comments double escape for Latex : é => \\u00E9
for classical R comments use trans general : é => e
for character in character vectors in R code simple escape : é => \u00E9

VincentGuyader commented 2 years ago

see also stringi::stri_escape_unicode(

ThinkR-open / checkhelper

As a dev, I want to add a way to deal with \uxxxx issue #12