danielcfurr / edstan

Other
8 stars 3 forks source link

Problem with OR in case study Two-Parameter Logistic Item Response Model tackeled #18

Open Famondir opened 4 years ago

Famondir commented 4 years ago

Dear readers, I just worked through your case study "Two-Parameter Logistic Item Response Model" (https://mc-stan.org/users/documentation/case-studies/tutorial_twopl.html) and ran into an issue adapting your odds ratio function. I had some compares where some of the pairs counted to zero and table() gave back a frequency table smaller that 2x2. It ttok my some hours of work to come up with this (probably not optimized) code snipped (see below).

Maybe you know a better way to handle this issue but i think it might be handy for future readers of your case study to bypass this error faster.

Sincerelly Simon


  n <- nrow(X)
  p <- ncol(X)
  ind <- t(combn(p, 2))
  nind <- nrow(ind)
  OR <- numeric(nind)
  for (i in 1:nind) {
    # print(i)
    xtab <- my.freq.table(X[, ind[i, 1]], X[, ind[i, 2]]) 
    n00 <- xtab[1, 1]
    n01 <- xtab[1, 2]
    n10 <- xtab[2, 1]
    n11 <- xtab[2, 2]
    OR[i] <- (n00 * n11)/(n01 * n10)
  }
  pw.OR <- data.frame(ind, OR)
  names(pw.OR) <- c("item_i", "item_j", "OR")

  return(pw.OR)
}

my.freq.table <- function(x,y) {
  xtab <- table(x,y)

# handles frequency tables with lower rank
  if((nrow(xtab) != ncol(xtab)) | (nrow(xtab) == 1 & nrow(xtab) == 1)) {
    xtab <- table(x,y) %>% as.data.frame() %>% defactor()
    dat <- data.frame(c(0,0,0),
                      c(1,0,0),
                      c(0,1,0),
                      c(1,1,0)) %>% t() %>% data.frame() %>% setNames(c('x','y','Freq'))
    rownames(dat) <- NULL

    dat <- dat %>% mutate(Freq = ifelse(x == xtab$x & y == xtab$y, xtab$Freq, Freq))

    xtab <- xtabs(Freq ~ x+y, data = dat)
  } 

  return(xtab)
}

# I don't can handle factors well yet so I just pull the numbers out
defactor <- function(d) {
  dat <- data.frame(matrix(NA, nrow = nrow(d), ncol = ncol(d))) %>% setNames(colnames(d))

  for (i in 1:nrow(d)) {
    for (j in 1:ncol(d)) {
      if (is.factor(d[i,j])) {dat[i,j] <- as.numeric(levels(d[i,j]))[d[i,j]]}
      else {dat[i,j] <- d[i,j]}
    }
  }

  return(dat)
}