kaitejohnson commented 2 months ago

This is a WIP. Tasks include:

[X] refactor stan model to estimate n_subpop infection dynamics, with the first subpop being the reference subpop
[X] add a new parameter $m$, which represents the shift of the reference population from the central dyamic
[X] This also includes getting rid of language that interchangeably uses sites as subpops. Observations occur in lab-site pairs
[X] refactor get_stan_data() to appropriately index the observations
[X] refactor ww_inference to specify that when include_ww=0, ww_data should be empty
[X] add tests for the 3 use cases described in #149 that ensure the stan data is coded properly
[X] added logic to get_draws() so that if include_ww == 0, predicted_ww and subpop_rt are not returned if the user passes all, and an error message is returned if they asked for either of those outputs.
[X] update test data because of new model formulation
[X] revise model definition
[X] internal bookkeeping refactor (see below, creation of standardized mappings that we use to generate stan data and pass as an output for post processing)

Some scope creep:

[X] parameter diagnostics and summary diagnostics seemed to have gotten swamped in their naming. Edited accordingly so summary_diagnostics() spits out the number of divergent transitions in each train, the number exceeding the max tree depth, and the EBFMI of each chain. parameter_diagnostics() spits out the table of metadata on each model parameter (e.g. mean, median, stdev, ess, rhat)

As always anytime we change the model:

[X] update test data
[X] make sure tests pass

kaitejohnson commented 2 months ago

Before I review further, I think you want to keep things like i_first_obs and initial_exp_growth_rate as having an implicit central value that is not necessarily the same as the reference population's value. So you'll want offsets for the reference population for those too.

You'll also want to only infer and apply those offsets the number of subpopulations is > 1, otherwise they should be fixed at 0.

Good catch, I added the offset as the final step and forgot about the other site level parameters. Will edit.

kaitejohnson commented 2 months ago

Updated, though I wasn't sure exactly what good priors on these offsets would be, in particular m_first_obs_stdev other than very small to ensure that the value samples for I0/N doesn't go above 1 or below 0...

kaitejohnson commented 2 months ago

Test are passing locally from my Mac. Will try regenerating from WSL2 (was frozen..)

dylanhmorris commented 2 months ago

168 is now ready for review @kaitejohnson

gvegayon commented 2 months ago

Current R CMD check error here:

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
              sd_log_sigma_ww_site_prior_sd = 0.693, t_peak_mean = 5L, 
              t_peak_sd = 1L, viral_peak_mean = 5.1, viral_peak_sd = 0.5, 
              ww_site_mod_sd_sd = 0.25)), fit_opts = list(iter_warmup = 25, 
          iter_sampling = 25, n_chains = 1, seed = 55, adapt_delta = 0.95, 
          max_treedepth = 12), generate_initial_values = TRUE)`: argument "ww_data" is missing, with no default
  Backtrace:
      ▆
   1. ├─withr::with_seed(...) at test_no_ww_model.R:9:3
   2. │ └─withr::with_preserve_seed(...)
   3. ├─base::do.call(wwinference::wwinference, model_test_data_no_ww) at test_no_ww_model.R:10:5
   4. └─wwinference (local) `<fn>`(...)

  [ FAIL 1 | WARN 3023 | SKIP 0 | PASS 242 ]
  Error: Test failures
  Execution halted
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE

Status: 1 ERROR, 1 NOTE

kaitejohnson commented 2 months ago

Current R CMD check error here:

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
              sd_log_sigma_ww_site_prior_sd = 0.693, t_peak_mean = 5L, 
              t_peak_sd = 1L, viral_peak_mean = 5.1, viral_peak_sd = 0.5, 
              ww_site_mod_sd_sd = 0.25)), fit_opts = list(iter_warmup = 25, 
          iter_sampling = 25, n_chains = 1, seed = 55, adapt_delta = 0.95, 
          max_treedepth = 12), generate_initial_values = TRUE)`: argument "ww_data" is missing, with no default
  Backtrace:
      ▆
   1. ├─withr::with_seed(...) at test_no_ww_model.R:9:3
   2. │ └─withr::with_preserve_seed(...)
   3. ├─base::do.call(wwinference::wwinference, model_test_data_no_ww) at test_no_ww_model.R:10:5
   4. └─wwinference (local) `<fn>`(...)

  [ FAIL 1 | WARN 3023 | SKIP 0 | PASS 242 ]
  Error: Test failures
  Execution halted
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE

Status: 1 ERROR, 1 NOTE

Yeah I am going to expand this test a bit to include 1. passing ww but setting include_ww = 0 2. passing no ww and setting include_ww = 0 and 3. passing no ww and setting include_ww = 1. The model should be able to handle all three cases, but it can currently only handle 1!

kaitejohnson commented 2 months ago

Relatedly, I would love to suppress the ESS warnings (either adding more iterations or ideally just telling cmdstan to suppress them)... will look into it

dylanhmorris commented 2 months ago

Relatedly, I would love to suppress the ESS warnings (either adding more iterations or ideally just telling cmdstan to suppress them)... will look into it

Worth discussing in #145

Right now suppressing the warnings won't be possible due to #174

kaitejohnson commented 2 months ago

looks like ww tests are failing again, will try regenerating test data...

dylanhmorris commented 1 month ago

@kaitejohnson still a few failing tests. I made one easy fix myself. Could you fix the others and re-request review?

kaitejohnson commented 1 month ago

Pretty significant refactor of internal workings so that we are passing around the following mappings: date-time mapping:

date_time_spine
# A tibble: 126 × 2
   date           t
   <date>     <int>
 1 2023-09-01     1
 2 2023-09-02     2
 3 2023-09-03     3
 4 2023-09-04     4
 5 2023-09-05     5
 6 2023-09-06     6
 7 2023-09-07     7
 8 2023-09-08     8
 9 2023-09-09     9
10 2023-09-10    10
# ℹ 116 more rows
# ℹ Use `print(n = ...)` to see more rows

lab-site subpop spine:

A tibble: 5 × 9
  lab_site_index site_index  site   lab site_pop lab_site_name   subpop_pop subpop_index subpop_name
           <int>      <int> <dbl> <dbl>    <dbl> <glue>               <dbl>        <int> <chr>      
1              1          1     2     1   400000 Site: 2, Lab: 1     400000            2 Site: 2    
2              2          1     2     2   400000 Site: 2, Lab: 2     400000            2 Site: 2    
3              3          2     3     3   200000 Site: 3, Lab: 3     200000            3 Site: 3    
4              4          3     4     3   100000 Site: 4, Lab: 3     100000            4 Site: 4    
5              5          4     5     3    50000 Site: 5, Lab: 3      50000            5 Site: 5

site subpop spine:

site_subpop_spine
# A tibble: 5 × 5
  site_index  site subpop_pop subpop_index subpop_name            
       <int> <dbl>      <dbl>        <int> <chr>                  
1         NA    NA    2250000            1 remainder of population
2          1     2     400000            2 Site: 2                
3          2     3     200000            3 Site: 3                
4          3     4     100000            4 Site: 4                
5          4     5      50000            5 Site: 5

The logic for creating the auxiliary subpopulation or setting the first and only subpopulation to the entire population if hospital admissions only is now only happening once in the get_site_subop_spine() function which gets called inside wwinference().

These get returned as raw_input_data in the wwinference_fit object and are used directly for post processing.

Major changes from previous implementation:

created functions to get all the "spines" described above
update get_stan_data() to take these as inputs
create a single get_ww_indices_and_values() function where these are combined with the ww data (if its present) to get the needed vectors for stan, returns just whats necessary for the hosp only model if using that with the rest as numeric()
update passing to stan to use the output from this function
update get_draws() to use these spines
update test_get_stan_data() to handle the new inputs

github-actions[bot] commented 1 month ago

Thank you for your contribution, @kaitejohnson :rocket:! Your page is ready to preview here

kaitejohnson commented 1 month ago

I believe this is still failing at the test_ww_model test, despite regenerating test data. This has been a consistent issue, and tbh I am not really sure this test is providing us much value. Also very frustrating that the test data can't be created on a mac (running the tests locally froze my Ubuntu).

Perhaps is a separate issue but just flagging.

kaitejohnson commented 1 month ago

Pending CI passes, this is ready for review.

CDCgov / ww-inference-model

Restructure hierarchical estimation based on reference subpopulation #158

168 is now ready for review @kaitejohnson