CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
17 stars 2 forks source link

Restructure hierarchical estimation based on reference subpopulation #158

Closed kaitejohnson closed 1 month ago

kaitejohnson commented 2 months ago

This is a WIP. Tasks include:

Some scope creep:

As always anytime we change the model:

kaitejohnson commented 2 months ago

Before I review further, I think you want to keep things like i_first_obs and initial_exp_growth_rate as having an implicit central value that is not necessarily the same as the reference population's value. So you'll want offsets for the reference population for those too.

You'll also want to only infer and apply those offsets the number of subpopulations is > 1, otherwise they should be fixed at 0.

Good catch, I added the offset as the final step and forgot about the other site level parameters. Will edit.

kaitejohnson commented 2 months ago

Updated, though I wasn't sure exactly what good priors on these offsets would be, in particular m_first_obs_stdev other than very small to ensure that the value samples for I0/N doesn't go above 1 or below 0...

kaitejohnson commented 2 months ago

Test are passing locally from my Mac. Will try regenerating from WSL2 (was frozen..)

dylanhmorris commented 2 months ago

168 is now ready for review @kaitejohnson

gvegayon commented 2 months ago

Current R CMD check error here:

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
              sd_log_sigma_ww_site_prior_sd = 0.693, t_peak_mean = 5L, 
              t_peak_sd = 1L, viral_peak_mean = 5.1, viral_peak_sd = 0.5, 
              ww_site_mod_sd_sd = 0.25)), fit_opts = list(iter_warmup = 25, 
          iter_sampling = 25, n_chains = 1, seed = 55, adapt_delta = 0.95, 
          max_treedepth = 12), generate_initial_values = TRUE)`: argument "ww_data" is missing, with no default
  Backtrace:
      ▆
   1. ├─withr::with_seed(...) at test_no_ww_model.R:9:3
   2. │ └─withr::with_preserve_seed(...)
   3. ├─base::do.call(wwinference::wwinference, model_test_data_no_ww) at test_no_ww_model.R:10:5
   4. └─wwinference (local) `<fn>`(...)

  [ FAIL 1 | WARN 3023 | SKIP 0 | PASS 242 ]
  Error: Test failures
  Execution halted
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE

Status: 1 ERROR, 1 NOTE
kaitejohnson commented 2 months ago

Current R CMD check error here:

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
              sd_log_sigma_ww_site_prior_sd = 0.693, t_peak_mean = 5L, 
              t_peak_sd = 1L, viral_peak_mean = 5.1, viral_peak_sd = 0.5, 
              ww_site_mod_sd_sd = 0.25)), fit_opts = list(iter_warmup = 25, 
          iter_sampling = 25, n_chains = 1, seed = 55, adapt_delta = 0.95, 
          max_treedepth = 12), generate_initial_values = TRUE)`: argument "ww_data" is missing, with no default
  Backtrace:
      ▆
   1. ├─withr::with_seed(...) at test_no_ww_model.R:9:3
   2. │ └─withr::with_preserve_seed(...)
   3. ├─base::do.call(wwinference::wwinference, model_test_data_no_ww) at test_no_ww_model.R:10:5
   4. └─wwinference (local) `<fn>`(...)

  [ FAIL 1 | WARN 3023 | SKIP 0 | PASS 242 ]
  Error: Test failures
  Execution halted
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE

Status: 1 ERROR, 1 NOTE

Yeah I am going to expand this test a bit to include 1. passing ww but setting include_ww = 0 2. passing no ww and setting include_ww = 0 and 3. passing no ww and setting include_ww = 1. The model should be able to handle all three cases, but it can currently only handle 1!

kaitejohnson commented 2 months ago

Relatedly, I would love to suppress the ESS warnings (either adding more iterations or ideally just telling cmdstan to suppress them)... will look into it

dylanhmorris commented 2 months ago

Relatedly, I would love to suppress the ESS warnings (either adding more iterations or ideally just telling cmdstan to suppress them)... will look into it

Worth discussing in #145

Right now suppressing the warnings won't be possible due to #174

kaitejohnson commented 2 months ago

looks like ww tests are failing again, will try regenerating test data...

dylanhmorris commented 1 month ago

@kaitejohnson still a few failing tests. I made one easy fix myself. Could you fix the others and re-request review?

kaitejohnson commented 1 month ago

Pretty significant refactor of internal workings so that we are passing around the following mappings: date-time mapping:

date_time_spine
# A tibble: 126 × 2
   date           t
   <date>     <int>
 1 2023-09-01     1
 2 2023-09-02     2
 3 2023-09-03     3
 4 2023-09-04     4
 5 2023-09-05     5
 6 2023-09-06     6
 7 2023-09-07     7
 8 2023-09-08     8
 9 2023-09-09     9
10 2023-09-10    10
# ℹ 116 more rows
# ℹ Use `print(n = ...)` to see more rows

lab-site subpop spine:

A tibble: 5 × 9
  lab_site_index site_index  site   lab site_pop lab_site_name   subpop_pop subpop_index subpop_name
           <int>      <int> <dbl> <dbl>    <dbl> <glue>               <dbl>        <int> <chr>      
1              1          1     2     1   400000 Site: 2, Lab: 1     400000            2 Site: 2    
2              2          1     2     2   400000 Site: 2, Lab: 2     400000            2 Site: 2    
3              3          2     3     3   200000 Site: 3, Lab: 3     200000            3 Site: 3    
4              4          3     4     3   100000 Site: 4, Lab: 3     100000            4 Site: 4    
5              5          4     5     3    50000 Site: 5, Lab: 3      50000            5 Site: 5    

site subpop spine:

site_subpop_spine
# A tibble: 5 × 5
  site_index  site subpop_pop subpop_index subpop_name            
       <int> <dbl>      <dbl>        <int> <chr>                  
1         NA    NA    2250000            1 remainder of population
2          1     2     400000            2 Site: 2                
3          2     3     200000            3 Site: 3                
4          3     4     100000            4 Site: 4                
5          4     5      50000            5 Site: 5    

The logic for creating the auxiliary subpopulation or setting the first and only subpopulation to the entire population if hospital admissions only is now only happening once in the get_site_subop_spine() function which gets called inside wwinference().

These get returned as raw_input_data in the wwinference_fit object and are used directly for post processing.

Major changes from previous implementation:

github-actions[bot] commented 1 month ago

Thank you for your contribution, @kaitejohnson :rocket:! Your page is ready to preview here

kaitejohnson commented 1 month ago

I believe this is still failing at the test_ww_model test, despite regenerating test data. This has been a consistent issue, and tbh I am not really sure this test is providing us much value. Also very frustrating that the test data can't be created on a mac (running the tests locally froze my Ubuntu).

Perhaps is a separate issue but just flagging.

kaitejohnson commented 1 month ago

Pending CI passes, this is ready for review.