extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
184 stars 27 forks source link

`rust_source` overhaul #35

Closed Ilia-Kosenkov closed 3 years ago

Ilia-Kosenkov commented 3 years ago

In light of recent PRs (#29, #31, #34) I would like to suggest an overhaul of rust_source function that will simplify its usage, especially in the test environment.

 # Append rtools path to the end of PATH
  if (.Platform$OS.type == "windows" && nzchar(Sys.getenv("RTOOLS40_HOME"))) {
    env_path <- Sys.getenv("PATH")
    r_tools_path <-
      normalizePath(
        file.path(
          Sys.getenv("RTOOLS40_HOME"), # {rextendr} targets R >= 4.0
          paste0("mingw", ifelse(R.version$arch == "i386", "32", "64")),
          "bin"
        )
      )
    Sys.setenv(PATH = paste(env_path, r_tools_path, sep = .Platform$path.sep))
    on.exit(Sys.setenv(PATH = env_path))
  }

Because RTools path is appended, any other mingw system on the path (configured by the user) will get prioritized, but if none is present, RTools will be used. With this patch, I am able to run rcmdcheck with --force-multiarch with a 'clean' system's PATH.

Usage of options should also help to address the usage of environment variables in test configurations -- there will be no need for formal arguments rewriting, the same will be achieved by simply setting options().

clauswilke commented 3 years ago

I don't understand this comment: "very difficult to provide manually"

In general I'm not a big fan of options(), but they're better than rewriting formals during the tests, so that's a good reason to use them.

Also, you're listing three different things here, I believe. Maybe break the issue into three and go one step at a time?

Ilia-Kosenkov commented 3 years ago

@clauswilke, My point is if you want to override patch.crates_io, you need to format a complex string, correctly escape quotes if you use interpolation and so on. Using lists we can offload string formatting onto a dedicated function, which I definitely prefer, but it is up to you to decide.

I personally do not like options because I can never keep track of what controls what, especially if options are retrieved in the body of a library function. If we however use it only as default parameters, it may not be that bad.

This issue was just a compilation of what I think could be done to rust_source to make it friendlier, in light of recent struggles. We can keep it as a reference check list.

clauswilke commented 3 years ago

My point is if you want to override patch.crates_io, you need to format a complex string, correctly escape quotes if you use interpolation and so on. Using lists we can offload string formatting onto a dedicated function, which I definitely prefer, but it is up to you to decide.

Could you make an example of how it currently looks and how you'd want it to look? If we make a change we should do so soon, before the package gains widespread use.

Ilia-Kosenkov commented 3 years ago

@clauswilke , I'll make a reprex.

Ilia-Kosenkov commented 3 years ago

I was able to come up with some primitive tool to serialize list() into toml. There is not up-to-date library available in R (and I am not sure we need to have an extra dependency). So the result is:

git_ref <- list(git = "https://github.com/extendr/extendr", branch = "master")
patch.crates_io <- list(
    `extendr-api` = git_ref,
    `extendr-macros` = git_ref,
    `libR-sys` = "0.3.0"
)
vec_c("[patch.crates_io]", format_toml(patch.crates_io)) %>%
    cat(sep = "\n")
#> [patch.crates_io]
#> extendr-api.git = "https://github.com/extendr/extendr"
#> extendr-api.branch = "master"
#> extendr-macros.git = "https://github.com/extendr/extendr"
#> extendr-macros.branch = "master"
#> libR-sys = "0.3.0"
format_toml() ``` r library(glue) library(vctrs) library(purrr) library(crayon) get_err_msg <- function() "Object cannot be serialzied." get_cross <- function(crayon = TRUE) if (crayon) crayon::red("x") else "x" get_info <- function(crayon = TRUE) if (crayon) crayon::yellow("i") else "i" format_toml <- function(x, ...) UseMethod("format_toml") format_toml.default <- function(x, ...) stop( glue("{get_err_msg()}\n {get_cross()} `{vec_ptype_abbr(x)}` cannot be converted to toml.\n"), call. = FALSE) format_toml.character <- function(x, ...) { if (vec_size(x) == 0L) { stop( glue("{get_err_msg()}\n {get_cross()} `x` has length of `0`.\n {get_info()} Input should be of length > 1.\n"), call. = FALSE ) } else if (vec_size(x) == 1L) { vec_cast(glue("\"{x}\""), character()) } else { purrr::map_chr(x, ~glue("\"{.x}\"")) -> items paste0("[ ", paste0(items, collapse = ", "), " ]") } } format_toml.integer <- function(x, format_int = "%d", ...) { if (vec_size(x) == 0L) { stop( glue("{get_err_msg()}\n {get_cross()} `x` has length of `0`.\n {get_info()} Input should be of length > 1.\n"), call. = FALSE ) } else if (vec_size(x) == 1L) { sprintf(format_int, x) } else { purrr::map_chr(x, ~sprintf(format_int, .x)) -> items paste0("[ ", paste0(items, collapse = ", "), " ]") } } format_toml.double <- function(x, format_dbl = "%g", ...) { if (vec_size(x) == 0L) { stop( glue("{get_err_msg()}\n {get_cross()} `x` has length of `0`.\n {get_info()} Input should be of length > 1.\n"), call. = FALSE ) } else if (vec_size(x) == 1L) { sprintf(format_dbl, x) } else { purrr::map_chr(x, ~sprintf(format_dbl, .x)) -> items paste0("[ ", paste0(items, collapse = ", "), " ]") } } format_toml.list <- function(x, name = "", ...) { if (nzchar(name)) name <- paste0(name, ".") names <- names(x) map2(names, x, function(nm, val) { if (is.atomic(val)) { glue("{paste0(name, nm)} = {format_toml(val)}") } else { nm <- paste0(name, nm) format_toml(val, name = nm, ...) } }) -> result if (is.list(result)) { flatten_chr(result) } else { result } } ``` Created on 2021-02-07 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0)

In my opinion, this approach simplifies usage of rextendr::rust_* funtions. Compare to currently used:

c(
    "extendr-api = { git = \"https://github.com/extendr/extendr\", branch = \"master\"}",
    "extendr-macros = { git = \"https://github.com/extendr/extendr\", branch = \"master\"}",
     "libR-sys = \"0.3.0\""
)

One of the benefits of list() approach is that it provides primitive verification -- if the list is syntactically correct (R does not fail to eval the list creation expression), then the produced TOML will likely be syntactically correct as well. It will also filter out unsupported types (I have implemented only a few, e.g. logical is missing), so if data.frame goes into the list, it will produce a meaningful error message on R side rather than failing with random Cargo.toml is incorrectly formatted coming from Rust itself. This is a prototype, so the number of dependencies can be reduced.

UPD: I did not implement this, but I think if dots are used (somewhat tidy-way), we can do something like format_toml(patch.crates_io = patch.crates_io, fetaures = c("libR-sys/use-bindgen") to produce not only toml but [patch.crates_io] and [features] as well.

clauswilke commented 3 years ago

I'm fine with this. We're already importing glue, vctrs (implicitly), and purrr, so this doesn't create a large number of new dependencies. Not sure if crayon is needed but fine either way.

Ilia-Kosenkov commented 3 years ago

Closing because all features have been implemented & merged.