bstewart / stm

An R Package for the Structural Topic Model
Other
399 stars 98 forks source link

define two topics as one and estimate an effect for the resulting topic #200

Open panoptikum opened 5 years ago

panoptikum commented 5 years ago

Dear STM community,

I'm wondering whether it is possible to "merge" two topics into one as they correlate a lot or because I think that the topics align much. I would like to estimate an effect for the resulting topic, but if I am not mistaken when I specify two topics for estimateEffect, then the effect is computed separately? I have not fully understood the mechanisms behind STM, i.e. what all the data that is returned means, so maybe I can just add something up and feed this to the estimateEffect function?

Many thanks for any help.

panoptikum

panoptikum commented 5 years ago

I've found this on stackoverflow:

https://stackoverflow.com/questions/48657585/structural-topic-modeling-in-r-group-the-topics-deductively-and-estimate-effect

I guess that still is the last status?

How would look such a calculation that you've mentioned on stackoverflow, @bstewart ? I think my programming skills could be good enough.

panoptikum commented 5 years ago

Well, I've digged through the estimateEffect() and as far as I understand the code without uncertainty I can sum the thetasims of the topics I would like to group? Please correct me, if I'm mistaken. In that case, I would just modify the estimateEffect() or clone it and use this modified version for my needs.

Though this seems too easy to be true?

And what about the uncertain case? How would it differ?

bstewart commented 5 years ago

You are totally correct- without uncertainty you can just add them. With the uncertainty you can simulate draws from the posterior using thetaPosterior() and just add them within each draw!

panoptikum commented 5 years ago

Ok, sounds very doable. If you like, I can share my modified estimateEffect here and may open a PR?

bstewart commented 5 years ago

That sounds great. I've worked on some code to do this and ultimately the challenge has been in defining a syntax for people to specify what they want. Any suggestions would be much appreciated!

panoptikum commented 5 years ago

True, the syntax for specification will be the most challenging part. I'll think about it and make a proposal soon.

panoptikum commented 5 years ago

My proposal is a follows:

  1. add the parameter group to the function estimateEffect. The default is FALSE.
  2. If group=TRUE, user can specify one or several topic groups which summarise two or more topics and the function estimates the effect for the corresponding topic group. To be as close as possible to the existing syntax e.g. 'c(6)~...' or 'c(6, 12)~...' for topic 6 or topic 6 and 12. I propose 'c(6, 12) & c(8, 14)~ ...' to indicate to summarise the topics 6 and 12 / 8 and 14 as one topic.

I've already written code that includes this idea in estimateEffect and only uses base R. If you like my proposed syntax, I can open a PR.

jehoonchae commented 5 years ago

I also want to merge topics in stm model, but my programming skills and statistical knowledge are not enough to do it by myself. Do you mind if I ask your solution to merge topics? @panoptikum If it is hard to show your own code of estimateEffect which you corrected, can I know briefly about the procedure to get estimated effect of stm model with merged topics?

panoptikum commented 5 years ago

Hello, sure. You find below my modified functions and some internal functions of STM that you have to load:

estimateEffectDEV <- function (formula, stmobj, metadata = NULL, uncertainty = c("Global", 
                                                                                 "Local", "None"), group=FALSE, documents = NULL, nsims = 25, prior = NULL) 
{
  origcall <- match.call()
  thetatype <- match.arg(uncertainty)
  if (thetatype == "None") 
    nsims <- 1
  if (!is.null(documents)) {
    args <- asSTMCorpus(documents, data = metadata)
    documents <- args$documents
    metadata <- args$data
  }
  if (!inherits(formula, "formula")) 
    stop("formula must be a formula object.")
  if (!is.null(metadata) & !is.data.frame(metadata)) 
    metadata <- as.data.frame(metadata)
  termobj <- terms(formula, data = metadata)
  if (attr(termobj, "response") == 1) {
    response <- as.character(formula)[2]
    if (group==TRUE) {
      grouped_topics <- trimws(strsplit(response, "&")[[1]])
      K <- 1:length(grouped_topics)
    } else {
      K <- eval(parse(text = response))
      if (!(posint(K) && max(K) <= stmobj$settings$dim$K))
        stop("Topics specified as response in formula must be a set of positive integers equal to or less than the number of topics in the model.")
    }
    formula <- as.formula(as.character(formula)[c(1, 3)])
    termobj <- terms(formula, data = metadata)
  } else {
    K <- 1:stmobj$settings$dim$K
  }
  mf <- model.frame(termobj, data = metadata)
  xmat <- model.matrix(termobj, data = metadata)
  varlist <- all.vars(termobj)
  if (!is.null(metadata)) {
    data <- metadata[, varlist, drop = FALSE]
  }
  else {
    templist <- list()
    for (i in 1:length(varlist)) {
      templist[[i]] <- get(varlist[i])
    }
    data <- data.frame(templist)
    names(data) <- varlist
    rm(templist)
  }
  metadata <- data
  rm(data)
  if (!is.null(prior)) {
    if (!is.matrix(prior)) {
      prior <- diag(prior, nrow = ncol(xmat))
    }
    if (ncol(prior) != ncol(xmat)) 
      stop("number of columns in prior does not match columns in design matrix")
    prior.pseudo <- chol(prior)
    xmat <- rbind(xmat, prior.pseudo)
  }
  qx <- qr(xmat)
  if (qx$rank < ncol(xmat)) {
    prior <- diag(1e-05, nrow = ncol(xmat))
    prior.pseudo <- chol(prior)
    xmat <- rbind(xmat, prior.pseudo)
    qx <- qr(xmat)
    warning("Covariate matrix is singular.  See the details of ?estimateEffect() for some common causes.\n             Adding a small prior 1e-5 for numerical stability.")
  }
  storage <- vector(mode = "list", length = length(K))
  K_grouped <- c()
  for (i in 1:nsims) {
    if (thetatype == "None") 
      thetasims <- stmobj$theta
    else {
      thetasims <- thetaPosterior(stmobj, nsims = 1, type = thetatype, 
                                  documents = documents)
      thetasims <- do.call(rbind, thetasims)
    }
    for (k in K) {
      if (group==TRUE) {
        topics_to_group <- eval(parse(text = grouped_topics[k]))
        grouped_thetasims <- apply(thetasims[, topics_to_group], 1, sum)
        lm.mod <- qr.lm(grouped_thetasims, qx)
      } else {
        lm.mod <- qr.lm(thetasims[, k], qx)
      }
      storage[[which(k == K)]][[i]] <- summary.qr.lm(lm.mod)
      if (group==TRUE & i==1)
        K_grouped[k] <- paste0(c(k, ": ", paste0(topics_to_group, collapse = "+")), collapse = "")
    }
  }
  toreturn <- list(parameters = storage, topics = K_grouped, call = origcall, 
                   uncertainty = thetatype, formula = formula, data = metadata, 
                   modelframe = mf, varlist = varlist)
  class(toreturn) <- "estimateEffect"
  return(toreturn)
}

qr.lm <- function(y, qx) {
  if(length(y)!=nrow(qx$qr)) {
    #probably don't match because of a prior
    if(length(y)!=(nrow(qx$qr)-ncol(qx$qr))) stop("number of covariate observations does not match number of docs")
    #if it passes this check its the prior. thus
    y <- c(y,rep(0, ncol(qx$qr)))
  }
  beta <- solve.qr(qx, y)
  residuals <- qr.resid(qx,y)
  fitted.values <- qr.fitted(qx,y)
  df.residual <- length(fitted.values) - qx$rank
  out <- list(coefficients=beta, residuals=residuals, 
              fitted.values=fitted.values, 
              df.residual=df.residual, rank=qx$rank, qr=qx)
  out 
}
#this function rewrites the summary.lm() function
# to calculate from our reduced regression
summary.qr.lm <- function (object) {
  z <- object
  p <- z$rank
  rdf <- z$df.residual

  Qr <- object$qr
  n <- nrow(Qr$qr)
  p1 <- 1L:p
  r <- z$residuals
  f <- z$fitted.values

  mss <- ifelse(attr(z$terms, "intercept"), sum((f - mean(f))^2), sum(f^2)) 
  rss <- sum(r^2)

  resvar <- rss/rdf
  R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
  se <- sqrt(diag(R) * resvar)
  est <- z$coefficients[Qr$pivot[p1]]
  sigma <- sqrt(resvar)
  list(est=est, vcov=(sigma^2 * R))
}

print.summary.estimateEffect <- function(x, digits = max(3L, getOption("digits") - 3L), 
                                         signif.stars = getOption("show.signif.stars"), ...) {
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), 
      "\n\n", sep = "")

  for(i in 1:length(x$tables)) {
    cat(sprintf("\nTopic %s:\n", x$topics[i]))
    cat("\nCoefficients:\n")
    coefs <- x$tables[[i]]
    stats::printCoefmat(coefs, digits = digits, signif.stars = signif.stars, 
                        na.print = "NA", ...)
    cat("\n")
  }
  invisible(x)
}

is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) {
  abs(x - round(x)) < tol
} 

posint <- function(x) {
  all(is.wholenumber(x)) & all(x>0)
}

nonnegint <- function(x) {
  all(is.wholenumber(x)) & all(x>=0)
}

extract.estimateEffect <- function (x, covariate, model = NULL, topics = x$topics, method = "pointestimate", 
                                    cov.value1 = NULL, cov.value2 = NULL, moderator = NULL, moderator.value = NULL, 
                                    npoints = 100, nsims = 100, ci.level = 0.95, custom.labels = NULL, 
                                    labeltype = "numbers", n = 7, frexw = 0.5) 
{
  cthis <- stm:::produce_cmatrix(prep = x, covariate = covariate, 
                                 method = method, cov.value1 = cov.value1, cov.value2 = cov.value2, 
                                 moderator = moderator, moderator.value = moderator.value)
  simbetas <- stm:::simBetas(parameters = x$parameters, nsims = nsims)
  uvals <- cthis$cdata[[covariate]]
  offset <- (1 - ci.level)/2
  labels <- stm:::createLabels(labeltype = labeltype, covariate = covariate, 
                               method = method, cdata = cthis$cdata, cov.value1 = cov.value1, 
                               cov.value2 = cov.value2, model = model, n = n, topics = x$topics, 
                               custom.labels = custom.labels, frexw = frexw)
  out <- lapply(topics, function(i) {
    sims <- cthis$cmatrix %*% t(simbetas[[which(x$topics == 
                                                  i)]])
    if (method == "difference") {
      diff <- sims[1, ] - sims[2, ]
      out_inner <- data.frame(method = method, topic = i, 
                              covariate = covariate, covariate.value = paste0(cov.value1, 
                                                                              "-", cov.value2), estimate = mean(diff), std.error = sd(diff), 
                              ci.level = ci.level, ci.lower = quantile(diff, 
                                                                       offset), ci.upper = quantile(diff, 1 - offset), 
                              label = labels[which(x$topics == i)])
    }
    else {
      #browser()
      out_inner <- data.frame(method = method, topic = i, 
                              covariate = covariate, covariate.value = uvals, 
                              estimate = rowMeans(sims), std.error = sqrt(apply(sims, 
                                                                                1, stats::var)), ci.level = ci.level, ci.lower = apply(sims, 
                                                                                                                                       1, quantile, probs = offset), ci.upper = apply(sims, 
                                                                                                                                                                                      1, quantile, probs = (1 - offset)), label = labels[which(x$topics == 
                                                                                                                                                                                                                                                 i)])
      out_inner["tval"] <- out_inner["estimate"]/out_inner["std.error"]
      rdf <- nrow(x$data) - length(out_inner["estimate"])
      out_inner["p"] <- 2 * stats::pt(abs(out_inner[["tval"]]), rdf, lower.tail = FALSE)
    }
    if (!is.null(moderator)) {
      out_inner$moderator <- moderator
      out_inner$moderator.value <- moderator.value
    }
    rownames(out_inner) <- NULL
    return(out_inner)
  })
  out <- do.call("rbind", out)
  return(out)
}
jehoonchae commented 5 years ago

Thank you, so much! :) @panoptikum

dphh commented 4 years ago

Thanks for sharing your code, @panopticon. I also hope to estimate the effects of topic groups, so this is very helpful. Using your modified functions, I was able to estimate the effects of a topic group. However, using the modified summary function does not work for me, so I cannot look at the effects for topic groups. Is this a bug or am I not using your modified functions as intended? In case you or @jehoonchae are still following this thread, I would be grateful for any hint.

Reprex:

library(stm)
model <- stm(documents = poliblog5k.docs, vocab = poliblog5k.voc, K = 3)

#After creating modified functions with panopticon’s code from above:
test  <- estimateEffectDEV(c(1,2) ~ rating + day, stmobj = model, metadata = poliblog5k.meta, groups = TRUE)

print.summary.estimateEffect(test) # This throws an error
summary(test) # This throws an error as well

The errors that I get are:

Error in stats::printCoefmat(coefs, digits = digits, signif.stars = signif.stars, : 'x' must be coefficient matrix/data frame

Error in base::colMeans(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions
jehoonchae commented 4 years ago

Since I received several questions related to using the @panoptikum's customized function estimateEffectDEV() and drawing a plot on time, I wrote a brief explication on the issue. Please check the files named stm-merge.* in the repo. I hope it could be the answer to your question.

dphh commented 4 years ago

Thanks @jehoonchae, your work is helpful. However, my main interest is to obtain a regression table just as summary.estimateEffect produces it. As I wrote in my previous post, I think that @panopticon's modified version of print.summary.estimateEffect() should do this job, but it throws an error.

jehoonchae commented 4 years ago

Sorry, I misunderstood your question.

Please use the following code. I added marginally modified summary.estimateEffect() function.

estimateEffectDEV <- function (formula, stmobj, metadata = NULL, uncertainty = c("Global", 
                                                                                 "Local", "None"), group=FALSE, documents = NULL, nsims = 25, prior = NULL) 
{
  origcall <- match.call()
  thetatype <- match.arg(uncertainty)
  if (thetatype == "None") 
    nsims <- 1
  if (!is.null(documents)) {
    args <- asSTMCorpus(documents, data = metadata)
    documents <- args$documents
    metadata <- args$data
  }
  if (!inherits(formula, "formula")) 
    stop("formula must be a formula object.")
  if (!is.null(metadata) & !is.data.frame(metadata)) 
    metadata <- as.data.frame(metadata)
  termobj <- terms(formula, data = metadata)
  if (attr(termobj, "response") == 1) {
    response <- as.character(formula)[2]
    if (group==TRUE) {
      grouped_topics <- trimws(strsplit(response, "&")[[1]])
      K <- 1:length(grouped_topics)
    } else {
      K <- eval(parse(text = response))
      if (!(posint(K) && max(K) <= stmobj$settings$dim$K))
        stop("Topics specified as response in formula must be a set of positive integers equal to or less than the number of topics in the model.")
    }
    formula <- as.formula(as.character(formula)[c(1, 3)])
    termobj <- terms(formula, data = metadata)
  } else {
    K <- 1:stmobj$settings$dim$K
  }
  mf <- model.frame(termobj, data = metadata)
  xmat <- model.matrix(termobj, data = metadata)
  varlist <- all.vars(termobj)
  if (!is.null(metadata)) {
    data <- metadata[, varlist, drop = FALSE]
  }
  else {
    templist <- list()
    for (i in 1:length(varlist)) {
      templist[[i]] <- get(varlist[i])
    }
    data <- data.frame(templist)
    names(data) <- varlist
    rm(templist)
  }
  metadata <- data
  rm(data)
  if (!is.null(prior)) {
    if (!is.matrix(prior)) {
      prior <- diag(prior, nrow = ncol(xmat))
    }
    if (ncol(prior) != ncol(xmat)) 
      stop("number of columns in prior does not match columns in design matrix")
    prior.pseudo <- chol(prior)
    xmat <- rbind(xmat, prior.pseudo)
  }
  qx <- qr(xmat)
  if (qx$rank < ncol(xmat)) {
    prior <- diag(1e-05, nrow = ncol(xmat))
    prior.pseudo <- chol(prior)
    xmat <- rbind(xmat, prior.pseudo)
    qx <- qr(xmat)
    warning("Covariate matrix is singular.  See the details of ?estimateEffect() for some common causes.\n             Adding a small prior 1e-5 for numerical stability.")
  }
  storage <- vector(mode = "list", length = length(K))
  K_grouped <- c()
  for (i in 1:nsims) {
    if (thetatype == "None") 
      thetasims <- stmobj$theta
    else {
      thetasims <- thetaPosterior(stmobj, nsims = 1, type = thetatype, 
                                  documents = documents)
      thetasims <- do.call(rbind, thetasims)
    }
    for (k in K) {
      if (group==TRUE) {
        topics_to_group <- eval(parse(text = grouped_topics[k]))
        grouped_thetasims <- apply(thetasims[, topics_to_group], 1, sum)
        lm.mod <- qr.lm(grouped_thetasims, qx)
      } else {
        lm.mod <- qr.lm(thetasims[, k], qx)
      }
      storage[[which(k == K)]][[i]] <- summary.qr.lm(lm.mod)
      if (group==TRUE & i==1)
        K_grouped[k] <- paste0(c(k, ": ", paste0(topics_to_group, collapse = "+")), collapse = "")
    }
  }
  toreturn <- list(parameters = storage, topics = K_grouped, call = origcall, 
                   uncertainty = thetatype, formula = formula, data = metadata, 
                   modelframe = mf, varlist = varlist)
  class(toreturn) <- "estimateEffect"
  return(toreturn)
}

qr.lm <- function(y, qx) {
  if(length(y)!=nrow(qx$qr)) {
    #probably don't match because of a prior
    if(length(y)!=(nrow(qx$qr)-ncol(qx$qr))) stop("number of covariate observations does not match number of docs")
    #if it passes this check its the prior. thus
    y <- c(y,rep(0, ncol(qx$qr)))
  }
  beta <- solve.qr(qx, y)
  residuals <- qr.resid(qx,y)
  fitted.values <- qr.fitted(qx,y)
  df.residual <- length(fitted.values) - qx$rank
  out <- list(coefficients=beta, residuals=residuals, 
              fitted.values=fitted.values, 
              df.residual=df.residual, rank=qx$rank, qr=qx)
  out 
}
#this function rewrites the summary.lm() function
# to calculate from our reduced regression
summary.qr.lm <- function (object) {
  z <- object
  p <- z$rank
  rdf <- z$df.residual

  Qr <- object$qr
  n <- nrow(Qr$qr)
  p1 <- 1L:p
  r <- z$residuals
  f <- z$fitted.values

  mss <- ifelse(attr(z$terms, "intercept"), sum((f - mean(f))^2), sum(f^2)) 
  rss <- sum(r^2)

  resvar <- rss/rdf
  R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
  se <- sqrt(diag(R) * resvar)
  est <- z$coefficients[Qr$pivot[p1]]
  sigma <- sqrt(resvar)
  list(est=est, vcov=(sigma^2 * R))
}

rmvnorm<-function(n,mu,Sigma,chol.Sigma=chol(Sigma)) {
  E<-matrix(rnorm(n*length(mu)),n,length(mu))
  t(  t(E%*%chol.Sigma) +c(mu))
}

summary.estimateEffect <- function(object, topics=NULL, nsim=500, ...) {
  if(is.null(topics)) topics <- object$topics
  if(any(!(topics %in% object$topics))) {
    stop("Some topics specified with the topics argument are not available in this estimateEffect object.")
  }
  tables <- vector(mode="list", length=length(topics))
  for(i in seq_along(topics)) {
    topic = as.numeric(strsplit(topics[i], ":")[[1]][1])
    sims <- lapply(object$parameters[[topic]], function(x) rmvnorm(nsim, x$est, x$vcov))
    sims <- do.call(rbind,sims)
    est<- colMeans(sims)
    se <- sqrt(apply(sims,2, stats::var))
    tval <- est/se
    rdf <- nrow(object$data) - length(est)
    p <- 2 * stats::pt(abs(tval), rdf, lower.tail = FALSE)

    coefficients <- cbind(est, se, tval, p)
    rownames(coefficients) <- attr(object$parameters[[1]][[1]]$est, "names")
    colnames(coefficients) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|)")
    tables[[i]] <- coefficients
  }
  out <- list(call=object$call, topics=topics, tables=tables)
  class(out) <- "summary.estimateEffect"
  return(out)
}

print.summary.estimateEffect <- function(x, digits = max(3L, getOption("digits") - 3L), 
                                         signif.stars = getOption("show.signif.stars"), ...) {
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), 
      "\n\n", sep = "")

  for(i in 1:length(x$tables)) {
    cat(sprintf("\nTopic %s:\n", x$topics[i]))
    cat("\nCoefficients:\n")
    coefs <- x$tables[[i]]
    stats::printCoefmat(coefs, digits = digits, signif.stars = signif.stars, 
                        na.print = "NA", ...)
    cat("\n")
  }
  invisible(x)
}

is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) {
  abs(x - round(x)) < tol
} 

posint <- function(x) {
  all(is.wholenumber(x)) & all(x>0)
}

nonnegint <- function(x) {
  all(is.wholenumber(x)) & all(x>=0)
}

extract.estimateEffect <- function (x, covariate, model = NULL, topics = x$topics, method = "pointestimate", 
                                    cov.value1 = NULL, cov.value2 = NULL, moderator = NULL, moderator.value = NULL, 
                                    npoints = 100, nsims = 100, ci.level = 0.95, custom.labels = NULL, 
                                    labeltype = "numbers", n = 7, frexw = 0.5) 
{
  cthis <- stm:::produce_cmatrix(prep = x, covariate = covariate, 
                                 method = method, cov.value1 = cov.value1, cov.value2 = cov.value2, 
                                 moderator = moderator, moderator.value = moderator.value)
  simbetas <- stm:::simBetas(parameters = x$parameters, nsims = nsims)
  uvals <- cthis$cdata[[covariate]]
  offset <- (1 - ci.level)/2
  labels <- stm:::createLabels(labeltype = labeltype, covariate = covariate, 
                               method = method, cdata = cthis$cdata, cov.value1 = cov.value1, 
                               cov.value2 = cov.value2, model = model, n = n, topics = x$topics, 
                               custom.labels = custom.labels, frexw = frexw)
  out <- lapply(topics, function(i) {
    sims <- cthis$cmatrix %*% t(simbetas[[which(x$topics == 
                                                  i)]])
    if (method == "difference") {
      diff <- sims[1, ] - sims[2, ]
      out_inner <- data.frame(method = method, topic = i, 
                              covariate = covariate, covariate.value = paste0(cov.value1, 
                                                                              "-", cov.value2), estimate = mean(diff), std.error = sd(diff), 
                              ci.level = ci.level, ci.lower = quantile(diff, 
                                                                       offset), ci.upper = quantile(diff, 1 - offset), 
                              label = labels[which(x$topics == i)])
    }
    else {
      #browser()
      out_inner <- data.frame(method = method, topic = i, 
                              covariate = covariate, covariate.value = uvals, 
                              estimate = rowMeans(sims), std.error = sqrt(apply(sims, 
                                                                                1, stats::var)), ci.level = ci.level, ci.lower = apply(sims, 
                                                                                                                                       1, quantile, probs = offset), ci.upper = apply(sims, 
                                                                                                                                                                                      1, quantile, probs = (1 - offset)), label = labels[which(x$topics == 
                                                                                                                                                                                                                                                 i)])
      out_inner["tval"] <- out_inner["estimate"]/out_inner["std.error"]
      rdf <- nrow(x$data) - length(out_inner["estimate"])
      out_inner["p"] <- 2 * stats::pt(abs(out_inner[["tval"]]), rdf, lower.tail = FALSE)
    }
    if (!is.null(moderator)) {
      out_inner$moderator <- moderator
      out_inner$moderator.value <- moderator.value
    }
    rownames(out_inner) <- NULL
    return(out_inner)
  })
  out <- do.call("rbind", out)
  return(out)
}

Then try what you did again with summary() like the following.

library(stm)
source('estimateEffectDEV.R')
model <- stm(documents = poliblog5k.docs, vocab = poliblog5k.voc, K = 3)

test  <- estimateEffectDEV(c(1,2) ~ rating + s(day), 
                           stmobj = model, 
                           metadata = poliblog5k.meta, 
                           group = TRUE)

summary(test)
Call:
estimateEffectDEV(formula = c(1, 2) ~ rating + s(day), stmobj = model, 
    metadata = poliblog5k.meta, group = TRUE)

Topic 1: 1+2:

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.584850   0.041278  14.169   <2e-16 ***
ratingLiberal  0.088502   0.009519   9.298   <2e-16 ***
s(day)1        0.076430   0.081379   0.939   0.3477    
s(day)2       -0.094751   0.047893  -1.978   0.0479 *  
s(day)3       -0.069118   0.059332  -1.165   0.2441    
s(day)4       -0.046397   0.048474  -0.957   0.3385    
s(day)5       -0.056602   0.053356  -1.061   0.2888    
s(day)6        0.046230   0.049852   0.927   0.3538    
s(day)7        0.145387   0.052234   2.783   0.0054 ** 
s(day)8        0.072769   0.060424   1.204   0.2285    
s(day)9        0.028451   0.062221   0.457   0.6475    
s(day)10      -0.092038   0.059254  -1.553   0.1204    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
panoptikum commented 4 years ago

Glad to read that you've figured it out. Sorry for any errors, but I do not have the time to cross validate at the moment.

robertberryuk commented 1 year ago

Thanks so much to @panoptikum and @jehoonchae for this code. It's really helped me out. And to @bstewart for the stm package too!

M-Gartiser commented 1 year ago

Thank you very much @panoptikum, @jehoonchae & of course @bstewart. You answered the question that kept me busy for weeks now. I just had a short question regarding the uncertainty issue raised above. Are the different uncertainty measures in estimateEffect applicable in your amendment, @panoptikum & @jehoonchae?

My data was ran through without problems using the 'Global' option and your example - @jehoonchae - would also use this, if I'm correct? But I'm unsure if I should proceed with this or if it is better for me to run the regressions without this. Thank you so much!

santoshbs commented 8 months ago

Thanks so much, @panoptikum , @jehoonchae , and @bstewart for the modified dev version of estimateEffect(). This is super helpful.

I was wondering how to modify and make use of the estimateeffect() code when the dependent variable is a proportion of a combination of multiple topics (e.g., (topic1 + topic3)/(topic2 + topic4 + topic1 + topic3)). Essentially, I was keen to understand how a metadata relates to relative topic emphasis.

Request your help.