Closed afmagee42 closed 1 day ago
@thanasibakis one question for you, who have been doing a good job at reducing config bloat: do we actually want the num_post_pred_samps
option in the config? In hind-sight this may have been more useful for debugging (sampling 2000 times from the predictive distribution takes time) than in practice, and we could assume we want one draw per posterior sample.
My big question is: what's the difference between the count-based scoring here, versus a count-based scoring that takes a weighted average of the proportion-based scores for each unit of analysis (geography & date), with the weights being the counts?
Is it mathematically equivalent (modulo some constant)? Even if this weighting isn't exactly mathematically equivalent, is it good enough? It's easier to reason about, at least, than the thing where we draw counts from the model.
(My understanding is that the current approach is to draw counts from a multinomial, with multinomial category proportions drawn from the posterior of $\phi$. If I've gotten that wrong, I might be misunderstanding other things.)
I think there are some ways that this idea could be generalized, with some different architecture, but I'll save those for #51 and #54. Like, if we did want to keep scoring truly on counts, then I would ask the model to generate the counts, rather than ask for proportions, and then generate counts from those proportions, assuming some out-of-"model" distribution for how those counts are produced.
I can also suggest some line-level improvements that I think will help clarity, but I don't think those are useful in light of the above.
This PR adds the ability to evaluate forecasts based on the predictive distribution of counts of sequences to the existing infrastructure for evaluation based on the posterior distribution on frequencies/proportions.
This is accomplished in three parts.
linmod.eval.generate_eval_counts()
,linmod.models.predict_counts()
, andlinmod.utils.expand_phi()
) have been added to enable the generation of posterior predictive count distributions.proportions_
from function names as appropriate.retrospective-forecasting/main.py
has been changed accordingly.To minimize code duplication moving forward, I also refactored how the per-division-day scoring gets aggregated to an overall score. In particular, I removed the middle-man functions
proportions_mean_norm()
andproportions_energy_score()
in favor oflinmod.eval.score()
which takes in a per-division-day scoring function and does the summation across division-days. Alternative aggregation schemes can now be implemented once in this function, instead of once per scoring function.The refactoring in (2) passes the
pytest
tests that were set up (after ensuring the tests use the correct functions).