ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

Support for additional distributions in the Standard API #671

Open vinopm opened 4 years ago

vinopm commented 4 years ago

Discussed with @mrow84 that we would need to add support for 'exponential' and 'uniform' distributions in all Standard API implementations.

We also need to update the Standard API spec with more information about which distributions are supported.

I believe the complete list of distributions we would support are:

github-actions[bot] commented 4 years ago

Heads up @mrow84 @bobturneruk - the "data pipeline api" label was applied to this issue.

vinopm commented 4 years ago

@mrow84 In the Contact Tracing Model, there is an input distirbution where we pass in a list of bins and a list of their respective weights. The model creates an enumerated integer distribution out of this. Can we add in support for this type of distribution as well?

Something like this?

[population-ages]
type = "distribution"
distribution = "enumerated"
bins = [0, 15, 25, 55, 65, 90]
weights = [0.1759, 0.1171, 0.4029, 0.1222, 0.1819]
mrow84 commented 4 years ago

@mrow84 In the Contact Tracing Model, there is an input distirbution where we pass in a list of bins and a list of their respective weights. The model creates an enumerated integer distribution out of this. Can we add in support for this type of distribution as well?

Something like this?

[population-ages]
type = "distribution"
distribution = "enumerated"
bins = [0, 15, 25, 55, 65, 90]
weights = [0.1759, 0.1171, 0.4029, 0.1222, 0.1819]

Yep, I think I would call this a categorical distribution.

vinopm commented 4 years ago

@mrow84 In the Contact Tracing Model, there is an input distirbution where we pass in a list of bins and a list of their respective weights. The model creates an enumerated integer distribution out of this. Can we add in support for this type of distribution as well? Something like this?

[population-ages]
type = "distribution"
distribution = "enumerated"
bins = [0, 15, 25, 55, 65, 90]
weights = [0.1759, 0.1171, 0.4029, 0.1222, 0.1819]

Yep, I think I would call this a categorical distribution.

Ok, so we will have support for this format:

[population-ages]
type = "distribution"
distribution = "categorical"
bins = [0, 15, 25, 55, 65, 90]
weights = [0.1759, 0.1171, 0.4029, 0.1222, 0.1819]
mrow84 commented 4 years ago

The only thing I wonder is if we might want the categories to be strings rather than numbers, and then parse them when it is deemed appropriate.

vinopm commented 4 years ago

@mrow84 the categories refer to ranges:

i.e. Age 0-15 -> 0.1759 probability

mrow84 commented 4 years ago

In some sense that makes me feel like a string may be even more appropriate, in that you could encode the range more explicitly - they do something like that in simple network sim. I am happy for you to leave the range stuff, but I do think that if the string conversion isn't too difficult that it would be a positive, because it is useful to be able to form discrete distributions over more arbitrary categories.

vinopm commented 4 years ago

@mrow84 I agree with you. Will require quite a bit of work to support the string range format, i'll add that as a TODO, but let's keep this format for now as a first step. Would that be ok?

mrow84 commented 4 years ago

I have been going through the distributions trying to come up with standardised parameterisations. This is what I have now, with links to wikipedia for parameterisation references, and includes both the distributions required by the java models and the EERA model (@kzscisoft / @peter-t-fox). I realise that it may in some circumstances require change file contents, so please let me know if this is too much of a drag, but I think we may already have some differences anyway, so someone is going to have to change something!

gamma

mrow84 commented 4 years ago

Also adding

binomial