Open TimothyWillard opened 1 month ago
Related to GH-300, but would be nice to also consolidate the custom distribution handling in the llik
method of gempyor.statistics.Statistics
into this solution as well. This would allow for (more) standardized naming + parameterizations of distributions used throughout flepiMoP
's config files. The relevant lines of code in question are:
dist_map = {
"pois": scipy.stats.poisson.logpmf,
"norm": lambda x, loc, scale: scipy.stats.norm.logpdf(
x, loc=loc, scale=self.params.get("scale", scale)
), # wrong:
"norm_cov": lambda x, loc, scale: scipy.stats.norm.logpdf(
x, loc=loc, scale=scale * loc.where(loc > 5, 5)
), # TODO: check, that it's really the loc
"nbinom": lambda x, n, p: scipy.stats.nbinom.logpmf(
x, n=self.params.get("n"), p=model_data
),
"rmse": lambda x, y: -np.log(np.nansum(np.sqrt((x - y) ** 2))),
"absolute_error": lambda x, y: -np.log(np.nansum(np.abs(x - y))),
}
if self.dist not in dist_map:
raise ValueError(f"Invalid distribution specified: {self.dist}")
if self.dist in ["pois", "nbinom"]:
model_data = model_data.astype(int)
gt_data = gt_data.astype(int)
Not really sure what to do about the MSE/MAE options, those aren't likelihoods they're losses so they don't correspond to an underlying probability distribution.
Is your feature request related to a problem? Please describe.
The
gempyor.utils.as_random_distribution/random_distribution_sampler
functions use a mix ofnp.random
rngs andscipy.stats
rngs. This makes testing these and functions that rely on these functions more convoluted then it should be as demonstrated bygempyor.testing. sample_fits_distribution
.See https://github.com/HopkinsIDD/flepiMoP/pull/277#discussion_r1707145163, https://github.com/HopkinsIDD/flepiMoP/pull/277#discussion_r1707178115.
Is your feature request related to a new application, scenario round, pathogen? Please describe.
The application is mostly for internal use in
gempyor
, it is unlikely that these functions will be used much externally but would assist in efforts to unit test and stabilizegempyor
.Describe the solution you'd like
Consolidate the logic in
gempyor.utils.as_random_distribution/random_distribution_sampler
andgempyor.testing. sample_fits_distribution
together and to use distributions fromscipy.stats
instead of fromnp.random
.