MUCollective / multiverse

R package for creating explorable multiverse analysis
https://mucollective.github.io/multiverse/
GNU General Public License v3.0
62 stars 5 forks source link

R Chunks In-Between Multiverse Chunks Produce Varying Behavior #118

Closed emstruong closed 3 months ago

emstruong commented 11 months ago

Hello, thank you for the important and great package. Just bringing some of the discussion here because it's easier to read code on GitHub's UI.

If you use the following R Markdown file the "R Chunk in Question" produces different behavior between the knitted HTML version and the live R-Studio version. Specifically, the live version correctly has a LR Chisq of 1322.7. However, the knitted version seems to use the last branch of the multiverse and has a LR Chisq of 1311.

This is my version of R-Studio and I am running R 4.3.1:

RStudio 2023.09.1+494 "Desert Sunflower" Release (cd7011dce393115d3a7c3db799dda4b1c7e88711, 2023-10-16) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) rstudio/2023.09.1+494 Chrome/116.0.5845.190 Electron/26.2.4 Safari/537.36

R Markdown Code RChunkb4MV.txt


title: "R Chunk Before MV Chunk" output: html_document date: "2023-11-14"

require(multiverse)
require(vcd)
require(vcdExtra)
library(magrittr)
library(purrr) 
library(dplyr)
library(data.table)
library(MASS)
data("DaytonSurvey", package = "vcdExtra")
M <- multiverse::multiverse()
SolvingZerosBranch <- branch(SolvingZeros,
       "Add 0.1" ~ function(.) {
         as.data.table(.) %>% 
           {.[Freq == 0, Freq := Freq + 0.1]}
       },
       "Replace 10 -10" ~ function(.){
         as.data.table(.) %>% 
           {.[Freq==0, Freq := Freq + (10^-10)]}
       },
       "Nothing" ~ identity)

CorrectedData <- do.call(SolvingZerosBranch, list(DaytonSurvey))

CorrectedTab <-
  xtabs(formula = Freq ~ alcohol + marijuana + cigarette + sex + race,
        data = CorrectedData)

CorrectedTab.nonzero<-ifelse(CorrectedTab>0, 1, 0)

log.model <-
  MASS::loglm(
    ~ sex * race + cigarette + alcohol + marijuana,
    data = CorrectedTab,
    start = CorrectedTab.nonzero,
    fitted = TRUE,
    keep.frequencies = TRUE
  )
vcdExtra::LRstats(log.model)
execute_multiverse(M)
M.Summary <- expand(M) %>% 
  dplyr::mutate(Model = purrr::map(.results, "log.model"))
vcdExtra::LRstats(M.Summary$Model[[1]])
vcdExtra::LRstats(M.Summary$Model[[2]])
vcdExtra::LRstats(M.Summary$Model[[3]])
abhsarma commented 3 months ago

Apologies for getting back on this so late.

When using RMarkdown (i.e., interactive use), one of the things that we tried to support with the library was to provide constant feedback just like a user would receive if there were implementing non-multiverse analysis. However, this becomes tricky due to the large number of analyses in a multiverse, so we pick a "default" analysis, nominally the one which is the result of picking the first option for each decision.

Unfortunately, when compiling with knitr, we do not have as much control and so what happens is that for every code block, all the analyses will get executed. Thus when you print whilst compiling knitr, we can only support one of the following types of behaviors:

  1. print output from every single universe (we don't want this)
  2. print the last output
  3. (not implemented yet) do not allow a user to print variables declared from a multiverse code block using r by throwing an error.

As you can probably tell, there's no right or wrong approach here, just a question of which behavior of the library will surprise a user the least. i opened a separate issue for this specifically, but I am going to close this issue now.