Allow analysis scripts to read data.json

WardBrian commented 3 weeks ago

Closes #238.

Adds data to the latestRun object
Adds a files argument to the pyodide and webr mechanisms to allow us to populate arbitrary files

This sets the SIR analysis.R example go from

# posterior predictive check using the pred_cases generated quantity

install.packages(c("outbreaks", "bayesplot"))
library(outbreaks)
library(posterior)
library(ggplot2)

# same as data generation
cases <- influenza_england_1978_school$in_bed
n_days <- length(cases)
ts <- 1:n_days

# Extract posterior predictive checks
pred_cases <- as.matrix(as_draws_df(as_draws_rvars(draws)$pred_cases))[, -(15:17)]

bayesplot::ppc_ribbon(y = cases, yrep = pred_cases,
                      x = ts, y_draw = "point") +
  theme_bw() +
  ylab("cases") + xlab("days")

to

# posterior predictive check using the pred_cases generated quantity

install.packages("bayesplot")
library(outbreaks)
library(posterior)
library(ggplot2)

# load from data
d <- jsonlite::read_json('./data.json')
cases <- unlist(d$cases)
n_days <- d$n_days
ts <- unlist(d$ts)

# Extract posterior predictive checks
pred_cases <- as.matrix(as_draws_df(as_draws_rvars(draws)$pred_cases))[, -(15:17)]

bayesplot::ppc_ribbon(y = cases, yrep = pred_cases,
                      x = ts, y_draw = "point") +
  theme_bw() +
  ylab("cases") + xlab("days")

This will be even nicer if the data has some randomization to it, in which case re-running the same code in analysis would not recover the same data, but this would allow it to

WardBrian commented 3 weeks ago

I wonder if creating an additional copy of an in-memory JSON data file could further tax the scarce memory resource in the case of models with very large data, but the increased utility is probably worth the risk in this case. Especially as the copy doesn't have to exist until after the sampler's been run.

If this does become a problem, I think we could work around it by using FS.createLazyFile, but this would require extra machinery to 'host' the file at a URL, so I didn't tackle it here. If you know a good way, we definitely could do that sooner rather than later

jsoules commented 3 weeks ago

To be clear--yeah, I don't have an answer here, or even evidence that it's going to be a problem; I suspect any such situation is either massively over-provisioned with data or is going to run into problems while still in the sampler phase, so realistically I'm not worried about it.

We can solve it if it's ever an issue.

flatironinstitute / stan-playground

Allow analysis scripts to read data.json #239