bstaton1 / postpack

R package for working with mcmc.lists
https://bstaton1.github.io/postpack/
Other
2 stars 1 forks source link

Consider adding the sub_index() functionality #51

Open bstaton1 opened 1 year ago

bstaton1 commented 1 year ago

In my GR-sslcm work, I've developed a framework that has further streamlined the use of 'postpack' for me, particularly with querying specific nodes.

Suppose a user has a covariance matrix Sigma[1:5,1:5] in their posterior samples. Further suppose they wish to extract only the diagonal elements. Currently, the user must write (or construct via some other code) this vector to pass to the params argument to query only the desired nodes:

match_params(post, params = c("Sigma[1,1]", "Sigma[2,2]", "Sigma[3,3]", "Sigma[4,4]", "Sigma[5,5]"))

With the sub_index() framework, this could go:

match_params(post, params = sub_index("Sigma[X,X]", X = 1:5))

However, the function I wrote to do this for GR-sslcm is not general, it only accepts replaceable placeholders for the dimensions in the model (e.g., pop, year, age, origin, etc.).

Here is a more generalized version, consider improving it by making index_list instead be a ... argument.

sub_index = function(params, index_list) {
  new_params = postpack:::ins_regex_bracket(params)
  for (i in 1:length(index_list)) {
    new_params = stringr::str_replace_all(new_params, names(index_list)[i], replacement = as.character(index_list[[i]]))
  }
  new_params = postpack:::rm_regex_bracket(new_params)
  return(new_params)
}

Here are some examples:

sub_index("Sigma[X,X]", list(X = 1:5))
sub_index("x[year,pop]", list(year = 5:10, pop = 2))
bstaton1 commented 1 year ago

I have updated this to accept the ... argument instead of the index_list argument:

sub_index = function(params, ...) {

  # bundle and count arguments passed to ...
  index_list = list(...)
  n_indices = length(index_list)

  # placeholder to be recursively edited
  new_params = postpack:::ins_regex_bracket(params)

  # if some indices were supplied, loop through them
  # replacing them with the numeric values supplied
  if (n_indices > 0) {
    for (i in 1:length(index_list)) {
      new_params = stringr::str_replace_all(
        string = new_params, 
        pattern = names(index_list)[i],
        replacement = as.character(index_list[[i]])
      )
    }
  } 

  # return output
  return(postpack:::rm_regex_bracket(new_params))
}

# works as expected
sub_index("x[first,second]", first = 5, second = 10:15)

# works as expected
sub_index("x[first,second]", first = 5, second = 10:15, third = 100)

# users may try to do this, prevent it or handle it correctly somehow.
sub_index("x[first,second]", first = 4:5, second = 10:15)

I'm envisioning adding an indices = list() argument to postpack::match_params() (and any function that calls it) and including something like:

do.call(sub_index, c(index_list, list(param = params))
bstaton1 commented 1 year ago

This function should be edited to search/replace values within the index only, since with the current approach, this:

sub_index("pop_mean[pop]", pop = 1)

would return:

"1_mean[1]"`

when we need it to return:

"pop_mean[1]"
bstaton1 commented 7 months ago

This version works well. It uses ... and searches/replaces only within brackets:

sub_index = function(params, ...) {
  index_list = list(...)
  new_params = postpack:::ins_regex_bracket(params)
  for (i in 1:length(index_list)) {
    new_bases = postpack:::drop_index(new_params)
    new_indices = stringr::str_extract(new_params, "\\[.+\\]$")
    new_indices = stringr::str_replace_all(string = new_indices, pattern = names(index_list)[i], replacement = as.character(index_list[[i]]))
    new_params = paste0(new_bases, new_indices)
  }
  new_params = postpack:::rm_regex_bracket(new_params)
  return(new_params)
}

Here is an example:

sub_index("type[start,to,type]", start = 1, to = 1:5, type = 1)

gives:

"type[1,1,1]" "type[1,2,1]" "type[1,3,1]" "type[1,4,1]" "type[1,5,1]"

However, it still won't iterate over several multi-dimensional arguments passed to ...:

sub_index("type[start,to,type]", start = 1, to = 1:5, type = 1)

should give:

"type[1,1,1]" "type[1,2,1]" "type[1,3,1]" "type[1,4,1]" "type[1,5,1]" "type[1,1,2]" "type[1,2,2]" "type[1,3,2]" "type[1,4,2]" "type[1,5,2]"

but instead gives:

Error in `stringr::str_replace_all()`:
! Can't recycle `string` (size 5) to match `replacement` (size 2).
Run `rlang::last_trace()` to see where the error occurred.