cynthiahqy / conformr-xmap-project

R Package for harmonising data of different classifications or aggregations into a single dataset
MIT License
7 stars 1 forks source link

Helper for modifying crossmaps #67

Closed cynthiahqy closed 1 year ago

cynthiahqy commented 1 year ago

Modifying Crossmaps

Maybe it might also make sense to provide a "modification" function like xmap_modify(xmap, from = NULL, to = NULL, weights = NULL) which:

It would also make for more expressive code when working with existing xmap objects -- e.g. with the iso_xmap object in the making-xmaps.Rmd vignette:

https://github.com/cynthiahqy/conformr-project/blob/3d9c2843d4ed63f106ed8dd34f1ac5643b7a0b00/xmap/vignettes/making-xmaps.Rmd#L299-L317

Instead coercing an existing xmap "again", you could have something like:

iso_xmap |>
  xmap_modify(col_from = "ISO2", col_to = "ISONumeric", col_weights = "link")

This could also power the inversion function which would just be a validated wrapper around something like:

xmap |>
  dplyr::mutate(new_weights = 1) |>
  modify_xmap_df(xmap, col_from = x_attrs$col_to, col_to = x_attrs$col_from, col_weights = "new_weights")
cynthiahqy commented 1 year ago

Missing/Optional/Default Arguments

Using NULL as defaults in modify_xmap_df(xmap, from = NULL, to = NULL, weights = NULL) doesn't really work because the user arguments are expected to be expressions not strings.

The logic I want is:

col_from <- ifelse(is.null(from),
                     x_attrs$col_from,
                     deparse(substitute(from)))

but if from is an expression (e.g. from = ISO3), then you get the error:

Error in ifelse(!is.null(from), x_attrs$col_from, deparse(substitute(from))) : 
  object 'ISO3' not found

For reference the function was:

modify_xmap_df <- function(xmap_df, from, to, weights){
  stopifnot(inherits(xmap_df, "xmap_df"))
  x_attrs <- attributes(xmap_df)

  col_from <- ifelse(is.null(from),
                     x_attrs$col_from,
                     deparse(substitute(from)))
  col_to <- ifelse(is.null(to), 
                     x_attrs$col_to,
                     deparse(substitute(to)))
  col_weights <- ifelse(is.null(weights), 
                     x_attrs$col_weights,
                     deparse(substitute(weights)))
}

The %||% infix also doesn't work since:

These rlang missing functions might help: https://rlang.r-lib.org/reference/missing_arg.html

Or sentinel value:

cynthiahqy commented 1 year ago

The user needs to provide at least ONE of from, to, weights, with the unspecified arguments assumed to stay the same. There should be some warning like:

if (all(missing(from), missing(to), missing(weights))){
    stop("Please supply at least one of `from`, `to` or `weights` to modify `xmap_df`")
  }
cynthiahqy commented 1 year ago

I think I need some type of sentinel function and to specify the default as a function call:

modify_xmap_df <- function(xmap_df,
                           from = attr(xmap_df, "col_from"),
                           to = attr(xmap_df, "col_to"),
                           weights = attr(xmap_df, "col_weights"))

Or something like modify_xmap_df(iso_xmap, from = xmap::same(), ...)

cynthiahqy commented 1 year ago

Helper interface verb ideas:

Not modifying the underlying data.frame (just "activate" another xmap):

## reset (accept expressions for column names; with optional args)
xmap_reset(xmap, from, to, weights)
xmap_reset(xmap, weights = <other-weights>)

Modifies the underlying data.frame:

If the modification interface accepted anonymous vectors:

## replace (with new vectors, checking vector length)
xmap_replace(xmap, v_from, v_to, v_weights)

## reweight (accept vector of new weights to replace old weights?)
xmap_reweight(xmap, v_weights)
xmap_reweight(iso_xmap, iso_xmap$other_col)

## redirect (accept strings for names of new to/from)
## @param v_weights vector of new weights, defaults to `rep(1, nrow(xmap))`
xmap_redirect(xmap, col_from, col_to, v_weights = NULL, weights_to = "r_weights")

## reverse
xmap_reverse(xmap)
# which wraps 
xmap_redirect(xmap, 
  col_from = x_attrs$col_to, 
  col_to = x_attrs$col_from,
  v_weights = rep(1, nrow(xmap)),
  weights_to = "r_weights"
)

Alternatively, require user provided vectors to first be added to the xmap via mutate or other table manipulation functions. There would be no replace function and the other modifications could be recast as:

## reweight
my_xmap |>
  mutate(new_weights =  <vector-of-weights>) |>
  xmap_reset(weights = new_weights)

which always succeeds if the new_weights are valid for the from-to links.

## redirect* 
my_xmap |>
   # ATTEMPT: different from, same to & weights
  xmap_reset(xmap, from = <other_source>)

my_xmap |>
   # ATTEMPT: different from & to but SAME weights
  xmap_reset(xmap, from = <other_source>, to = <other_target>) 

which just throws an ERROR if the existing weights don't work

## reverse
my_xmap |>
  mutate(r_weights = rep(1, n())) |>
  xmap_reset(xmap, from = <col_to>, to = <col_from>, weights = r_weights)
cynthiahqy commented 1 year ago

Should modification functions accept column names only, OR also vectors?

The tricky thing is that these functions try to do a few related things:

The creation of a related crossmap could is a bit of a multi-stage process with conditional steps:

Maybe the user should be advised to add new weights to the existing xmap as a column, instead of the function accepting anonymous vectors of values into v_* args.

What about functions that create weights for the user?

There needs to be a weights_to argument to provide a column name. But internally, the function adds a new column, then activates a new crossmap with that new column.

cynthiahqy commented 1 year ago

Robs suggests only offering:

Alt. name convention: xmap_switch_*:

cynthiahqy commented 1 year ago
cynthiahqy commented 1 year ago

Sentinel value for arguments: from=".replace" or similar

cynthiahqy commented 1 year ago

Method: modify_xmap_df()

modify_xmap_df <- function(xmap_df, col_from = NULL, col_to = NULL, col_weights = NULL){
  stopifnot(inherits(xmap_df, "xmap_df"))

  x_attrs <- attributes(xmap_df)
  col_from <- col_from %||% x_attrs$col_from
  col_to <- col_to %||% x_attrs$col_to
  col_weights <- col_weights %||% x_attrs$col_weights

  col_strings <- c(col_from, col_to, col_weights)
  df_check_cols(xmap_df, col_strings)

  df <- as.data.frame(xmap_df)
  col_order <- c(col_strings, setdiff(names(df), col_strings))
  df <- df[col_order]
  xmap <- new_xmap_df(df, col_from, col_to, col_weights)
  validate_xmap_df(xmap)

  return(xmap)
}
iso_xmap <- tibble::tribble(
      ~ISONumeric, ~ISO2, ~link,         ~country, ~ISO3,
            "004",  "AF",     1,    "Afghanistan", "AFG",
            "008",  "AL",     1,        "Albania", "ALB",
            "012",  "DZ",     1,        "Algeria", "DZA",
            "016",  "AS",     1, "American Samoa", "ASM",
            "020",  "AD",     1,        "Andorra", "AND"
      ) |>
  as_xmap_df(from = ISO2, to = ISONumeric, weights = link)

Generic: xmap_modify()

cynthiahqy commented 1 year ago

I'm not sure offering helpers for modifying crossmaps is necessary or even desirable.

cynthiahqy commented 1 year ago

Closing this issue since modification should be reserved for candidate links; and with #82 validate_as_xmap() and xmap_to_*() features this is relatively straightforward.