Assistance with writing a RMINC function

gdevenyi commented 6 years ago

Hi,

I'm trying to write a new function that will calculate effect sizes from vertexLm objects according to the methods of https://people.kth.se/~lang/Effect_size.pdf

Right now, my main stumbling block is I don't know what kind of object a vertexLm return is.

So far, I know I can use attr to extract some properties, and it looks like I can access data columns with [,"Name"] where "Name" comes from the dimname attributes.

Are there any other ways to get inside these objects, and is there a doc describing what this type of object is?

Thanks.

cfhammill commented 6 years ago

It's essentially just a matrix of coefficients, t-stats, and more. You can get it's matrix print method by calling

v[,] #instead of v

I think the custom printers for *lm objects may not have been my best design choices...

my intuition is that you just need to multiply your t-stats by sqrt(n) to convert them to effect sizes, but you can retroactively compute the SE for the estimate by doing beta-eff / tvalue-eff.

gdevenyi commented 6 years ago

The linked paper describes the methods to get effect sizes from GLM t-stats which is what I'll be implementing, so I'm fine on that front.

Thanks.

cfhammill commented 6 years ago

Great.

Typically the way I examine unfamiliar R objects is with str, it will tell you that vertexLm objects are just matrices. The [,] isn't documented anywhere though.

gdevenyi commented 6 years ago


#Try to implement methods from
#Nakagawa, S., Cuthill, I.C., 2007. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. Camb. Philos. Soc. 82, 591–605. https://doi.org/10.1111/j.1469-185X.2007.00027.x
vertexEffectSize_hedges_g_unbiased <- function(buffer, columns=NULL)
{
  # g_biased = t ( n1 + n2 ) / sqrt(n1*n2)*sqrt(df)
  # g_unbiased = g_biased * ( 1 - 3 / (4*(n1 + n2 - 2) - 1) )
  # g_variance_unbiased = (n1 + n2) / (n1*n2) + d_unbiased**2 / (2 * (n1 + n2 - 2))
  # N = n1+n2
  # NN = n1*n2
  # Need
  # t value
  # Group sizes for contrasts (n1 and n2)
  # Identify contrast groups
  originalMincAttrs <- mincAttributes(buffer)
  stattype <- originalMincAttrs$`stat-type`
  # Remove coefficients from buffer
  if(!is.null(stattype)){
    for (nStat in 1:length(stattype)) {
      if(stattype[nStat] == 'beta' || stattype[nStat] == 'R-squared' || stattype[nStat] == "logLik" || stattype[nStat] == "F" ) {
        if(!exists('indicesToRemove')) {
          indicesToRemove = nStat
        }
        else {
          indicesToRemove = c(indicesToRemove,nStat)
        }
      }
    }
    if(exists('indicesToRemove')) {

      updatedAttrs <- originalMincAttrs
      updatedAttrs$`stat-type` <- updatedAttrs$`stat-type`[-indicesToRemove]
      updatedAttrs$dimnames[[2]] <- updatedAttrs$dimnames[[2]][-indicesToRemove]

      buffer <- buffer[,-indicesToRemove]
      buffer <- setMincAttributes(buffer, updatedAttrs)
    }
  }
  n.cols <- length(columns)
  n.row <-0
  if (is.matrix(buffer)) {
    n.row <- nrow(buffer)
  }
  else {
    n.row <- length(buffer)
  }
  output <- matrix(1, nrow=n.row, ncol=2*n.cols)
  colnames(output) <- c(paste0("hedgesg-",columns),paste0("hedgesg_var-", columns))
  for (column in columns) {
    n1 = sum(updatedAttrs$model[, column])
    N = length(updatedAttrs$model[, column])
    n2 = N - n1
    NN = n1*n2
    df = updatedAttrs$df[[1]][2]
    output[,paste0("hedgesg-",column)] <- buffer[,paste0("tvalue-",column)] * N / (sqrt(NN)*sqrt(df)) * ( 1 - 3/(4*(N - 2) - 1) )
    output[,paste0("hedgesg_var-", column)] <- (N) / (NN) + output[,paste0("hedgesg-",column)]**2 / (2 * (N - 2))
  }
  rownames(output) <- rownames(buffer)
  attr(output, "likeVolume") <- attr(buffer, "likeVolume")
  return(output)
}

Based on the insides of mincFDR.

gdevenyi commented 6 years ago

Ideally, I'd like to enumerate the columns of t-statistics which are calculated using contrasts and output all of those, however I can't find a simple way (yet) to determine which are done via contrasts.

cfhammill commented 6 years ago

What do you mean are calculated via contrasts? Do you mean contrasts other than the simple dummy coding contrasts we use?

cfhammill commented 6 years ago

PS function looks good

gdevenyi commented 6 years ago

Yes simple dummy coding. This method only applies for group comparisons.

gdevenyi commented 6 years ago

I want to be able to say, "tstats 1 3 and 5 are dummy coded group comparisons", so it is valid to produce a hedges_g effect size from these columns.

gdevenyi commented 6 years ago

And a side question... say I wanted to generalize this to mincLm and anatLm.... how might I go about that?

cfhammill commented 6 years ago

you could do this:

model_call <- attr(buffer, "call")
model_call[[1]] <- quote(model.matrix)
names(model_call)[names(model_call) == "formula"] <- ""
mod_mat <- eval(model_call)
conts <- attr(mod_mat, "contrasts")
if(is.null(conts)) stop("No categorical variables in model, cannot compute g statistics")

cat_vars <- names(conts)[vapply(conts, function(x) all(x == "contr.treatment"), logical(1))]

then grep or grepl for cat_vars in your t-stat names.

After my meeting I'll show you how it could be done better with @bcdarwin and my new lenses package https://github.com/cfhammill/lenses

gdevenyi commented 6 years ago

Awesome:


#Try to implement methods from
#Nakagawa, S., Cuthill, I.C., 2007. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. Camb. Philos. Soc. 82, 591–605. https://doi.org/10.1111/j.1469-185X.2007.00027.x
vertexEffectSize <- function(buffer, columns=NULL)
{
  #Unbiased corrector function J from https://en.wikipedia.org/wiki/Effect_size#Hedges'_g
  #Watch out for exploding gamma function
  J <- function(a) {
    out <- tryCatch(
      {
    return (gamma(a/2) / ( sqrt(a/2)* gamma((a-1)/2)))
      },
    error=function(cond) {
      return(1)
    },
    warning=function(cond) {
      return(1)
    }
    )
    return(out)
  }
  # g_biased = t ( n1 + n2 ) / sqrt(n1*n2)*sqrt(df)
  # g_unbiased = g_biased * ( 1 - 3 / (4*(n1 + n2 - 2) - 1) )
  # g_variance_unbiased = (n1 + n2) / (n1*n2) + d_unbiased**2 / (2 * (n1 + n2 - 2))
  # N = n1+n2
  # NN = n1*n2
  # Need
  # t value
  # Group sizes for contrasts (n1 and n2)
  # Identify contrast groups

  originalMincAttrs <- mincAttributes(buffer)
  stattype <- originalMincAttrs$`stat-type`
  # Remove coefficients from buffer
  if(!is.null(stattype)){
    for (nStat in 1:length(stattype)) {
      if(stattype[nStat] == 'beta' || stattype[nStat] == 'R-squared' || stattype[nStat] == "logLik" || stattype[nStat] == "F" ) {
        if(!exists('indicesToRemove')) {
          indicesToRemove = nStat
        }
        else {
          indicesToRemove = c(indicesToRemove,nStat)
        }
      }
    }
    if(exists('indicesToRemove')) {

      updatedAttrs <- originalMincAttrs
      updatedAttrs$`stat-type` <- updatedAttrs$`stat-type`[-indicesToRemove]
      updatedAttrs$dimnames[[2]] <- updatedAttrs$dimnames[[2]][-indicesToRemove]

      buffer <- buffer[,-indicesToRemove]
      buffer <- setMincAttributes(buffer, updatedAttrs)
    }
  }

  if (is.null(columns)) {
    model_call <- attr(buffer, "call")
    model_call[[1]] <- quote(model.matrix)
    names(model_call)[names(model_call) == "formula"] <- ""
    mod_mat <- eval(model_call)
    conts <- attr(mod_mat, "contrasts")
    if(is.null(conts)) stop("No categorical variables in model, cannot compute g statistics")

    cat_vars <- names(conts)[vapply(conts, function(x) all(x == "contr.treatment"), logical(1))]
    columns = lapply(updatedAttrs$dimnames, function(x) grep(paste(cat_vars,collapse="|"), x, value= TRUE))[[2]]
    columns = gsub("tvalue-", "", columns, fixed=TRUE)
  }
  print(columns)

  n.cols <- length(columns)
  n.row <- 0
  if (is.matrix(buffer)) {
    n.row <- nrow(buffer)
  }
  else {
    n.row <- length(buffer)
  }
  output <- matrix(1, nrow=n.row, ncol=2*n.cols)
  colnames(output) <- c(paste0("hedgesg-",columns),paste0("hedgesg_var-", columns))
  for (column in columns) {
    n1 = sum(updatedAttrs$model[, column])
    N = length(updatedAttrs$model[, column])
    n2 = N - n1
    NN = n1*n2
    df = updatedAttrs$df[[1]][2]
    output[,paste0("hedgesg-",column)] <- buffer[,paste0("tvalue-",column)] * N / (sqrt(NN)*sqrt(df)) * J(N -2)
    output[,paste0("hedgesg_var-", column)] <- (N) / (NN) + output[,paste0("hedgesg-",column)]**2 / (2 * (N - 2))
  }
  rownames(output) <- rownames(buffer)
  attr(output, "likeVolume") <- attr(buffer, "likeVolume")
  return(output)
}

cfhammill commented 6 years ago

Does it work? If so want to open a PR? Also, you can set have minc/anatLm versions call the exact same code. Should work identically. You can do something like:

mincEffectSize <- function(buffer, columns=NULL){
  eff_call <- match.call()
  eff_call[[1]] <- quote(vertexEffectSize)
  eval(eff_call)
}

cfhammill commented 6 years ago

Here's a taste of our new library

library(lenses)
conts <-
  buffer %>%
  view(attr_l("call")) %>%
  set(first_l, quote(model.matrix)) %>%
  set(names_l %.% indexes(names(.) == "formula"), "") %>%
  eval %>%
  view(attr_l("contrasts"))

gdevenyi commented 6 years ago

Yup, it works. I think I want to clean up what attributes are passed into the new object, and I'll test out anat and minc objects as well.

Then I'll have to figure out how to make an R dev environment... :)

cfhammill commented 6 years ago

My opinionated preference is emacs + emacs speaks statistics + devtools + magit. But most people get great mileage out of RStudio. For you you will probably want to block write access to the central R packages, otherwise you might accidentally overwrite them with an overeager devtools updating packages. We use a special installer account so that this is never a risk, my user can't write to our R packages, so I can happily use my user package directory for development purposes. You can also just add an explicit library to the R_LIBS environment variable.

If you're interested in the emacs route, there're great developer hotkeys in ESS:

C-c C-w C-i: install package (uses devtools::install) C-c C-w C-d: document package (uses roxygen via devtools::document) C-c C-w C-c: check package (uses R CMD check via devtools::check C-c C-c: Send a chunk of code to the R session.

it runs the code in the package environment to cut down on recompiles. This trips me up sometimes, but rarely.

magit is just the greatest git client going. Unless I'm doing the most trivial thing I wont run git from the command line anymore.

If you want to set this up I can give you some more tips and snippets from my .emacs that remove some of the minor annoyances of a fresh ESS install.

gdevenyi commented 6 years ago

dev_mode from devtools works great.

cfhammill commented 6 years ago

sweet, I didn't know about that one!

gdevenyi commented 6 years ago

Moved to pr #227

Mouse-Imaging-Centre / RMINC

Assistance with writing a RMINC function #225