Open wolverdude opened 4 years ago
@lagerros I created this based on our conversation Monday. The specification is a bit vague (could just be my memory being fuzzy), but I believe this is what you wanted. Is there a person on the modeling team I can contact for nailing down the acceptance criteria?
Check out the "Multiple parameter spec" tab. https://docs.google.com/spreadsheets/d/1IxPMadPxjnphWSKG_6PxmsrCLoXe3cHGp1Ok9kcddPk/edit#gid=1831691945
There's a new column with a toggle where you can select the parameter you want to change, for each row.
I don't know how definition.py
works, so I don't know the extent to which refactoring is needed to enable this.
Some nuances:
1) Parameters for "Seasonality" and "Airline traffic" can only have one value throughout the entire simulation and all locations. So they can only be set once for each "Class" in the spreadsheet, and there's no need to provide a location or start date & end date. (Whereas beta, epsilon, mu and imu can vary between locations and times within a single simulation.)
2) Here's a flag for a potential upcoming feature, though it is as of yet uncertain. Currently there's a distinction between:
"Classes" (in other places called "Groups" or "Scenarios) corresponding to the different sidebar buttons you can click on the site
"Traces" corresponding to the different lines drawn within each scenario
At the moment, "Traces" are specified in the colab:
However, this means that all trace settings are taken as a Cartesian product with all "Classes" in the spreadsheet. In future, we might want to allow defining different traces for each Class, from within the spreadsheet. (For example, you might want to say that in the "Strong" scenario we visualise uncertainty over incubation period; whereas in the "Weak" scenario we visualise uncertainty over Seasonality.)
Enabling this functionality requires thinking through what the best implementation would be, and I haven't done that yet. Suggestions welcome.
Is the above clear enough as a spec?
If there are any questions, I believe asking Elizabeth ( @AcesoUnderGlass ) could help with a lot of them.
Okay, so for this spec, we do not want to change the BETA_MULTIPLIERS
, SEASONALITIES
cartesian product, but this may happen in the future.
What we do want for this spec is to implement all the options in that sheet except for the Trace
column. The thing that remains unclear to me is how to resolve overlapping values for the same parameter. Is it just first/last wins? Is there a hierarchy based on region? Also, it seems likely that we'll want different combination rules for different kinds of parameters (e.g. monsoon should act like seasonality and modify the betas for everything else within its window).
Other questions:
Type
column? How should it be used when compiling definitions? Should we be able to support new types in the future?I think it makes sense to implement this spec before #443, since you'll need some kind of interface to configure monsoons, and once #443 is implemented, we can just make it another kind of settable parameter.
Is this still "blocked" or can be worked on?
Sorry, I forgot to update the card state. I should have a PR up tomorrow.
@lagerros I've updated the example spreadsheet with examples of what you can now configure. Please review and let me know if this is what you had in mind and what tweaks, if any, you would like me to make.
@wolverdude I took a look -- this is really neat!
Especially nice is that having all those parameters there would mean different simulations could just be stored in different spreadsheets; and there would be no need to edit the colab. (Which is an annoying feature at the moment)
One thing I'm confused about is the "Background conditions" -- are they used in a Cartesian product to generate traces? Or do they generate groups? (C.f. the distinction in my message above)
Countermeasure package classes correspond to different scenarios/groups, one group per class. Background condition classes correspond to different Traces, one per class. Or if you want, it can be the other way around. Easy to change.
I basically copied the logic from the colab, so the behavior shouldn’t be much different. The one thing I did differently was lumping beta multipliers in with background conditions. I set it up this way based on what I was seeing on the current Balochistan site. If that’s undesirable, I can create a third Type for another Cartesian product.
One question I have for you (and the thing that kinda derailed me earlier):
Does this spreadsheet config supersede the scenarios and groups keys in the config.yaml? If not, then what should the info in config.yaml be used for?
@lagerros any thoughts on these? ^
Does this spreadsheet config supersede the scenarios and groups keys in the config.yaml? If not, then what should the info in config.yaml be used for?
I don't know what you mean by "supersede".
But basically, the group keys in config.yaml
should be entirely determined by what's in the spreadsheet (as should the names of traces in the legend).
Being able to make that easier; for example by downloading a config file from the colab, or something else, is on the data engineering roadmap and could be very helpful.
Yeah, "supersede" means replace. The thing is, the functionality in the spreadsheet could replace both the scenarios
and groups
keys in the config, except for the names and descriptions.
I think what I'll do is have the new colab export the config or something.
Okay, I think I've got it. How about we just use "Group" and "Trace" as the values for the Type
column, which should make it super clear how things are getting displayed. All parameters will be configured in the spreadsheet, but display names, etc. need to be placed in the config file for export. It would look like this:
scenarios:
config_sheet: "https://docs.google.com/spreadsheets/d/abc/edit#gid=123"
groups:
- name: Weak Mitigation
description: Mostly open borders; full opening of public places; no social distancing; little compliance with hygiene advice
- name: Moderate Mitigation
description: Some border closure; closure of schools; ban on public gatherings; partial social distancing
- name: Strong Mitigation
description: Strong external and internal border closure; full closure of public places (including places of worship); social distancing outside and within homes
- name: Recommended Mitigation
description: Moderate measures + compulsory masks; contact tracing; social distancing at places of worship and within homes
traces:
- name: "Slowest"
description: Very slow spread (50%)
- name: "Slower"
description: Slower spread (75%)
- name: "Expected"
description: Expected spread (100%)
- name: "Faster"
description: Faster spread (150%)
- name: "Fastest"
description: Very fast spread (175%)
name
values correspond to the values in the Class
column, and description
is what's displayed to the user, though for groups, name
will also be the title of the tab for users to click on. If we want to decouple that, it would be pretty easy to add an optional display_name
to the config.
Note that this assumes groups and traces are independent -- and I don't think that's assumed in the spreadsheet. (Feature, not bug)
This feature is not used at the moment; but might become useful in future.
But this might be fine as an MVP, and we can jump off that bridge when we get there (unless you can already seen an easy fix for this now).
@AcesoUnderGlass @lagerros This issue was completed by epimodel#57; however, there are some design decisions and areas of the spec I never fully clarified. These are things you may consider changing.
There is some code that I wrote and checked in that ended up not getting used and I neglected to remove it.
colabutils.py
- This was added to enable fetching gleam_parameters
directly from a Google sheet, but I scrapped that functionality in order to conform to luigi
expectations. You could convert the GleamParameters
task into a non-manual task that pulls the sheet data and saves it as a CSV. Otherwise, there's a couple of dependencies (gsheet
, oauth2client
) that are only used by this file which should be removed with it.ParseInput
- This is the feature that would fetch the mean of any foretold distribution whose UUID is listed in the Value
column. I was unable to add ergo
as a dependency due to incompatibilities, so I removed this from the luigi task, but I didn't remove any of the underlying code. This could be easily fixed by using the Foretold API directly instead of relying on ergo
, which we're already doing in the UpdateForetold task.group
and trace
for the cartesian product instead of Countermeasure package
and Background condition
which were in the original spec. I did this because it was clearer about what was happening. The only advantage I could see of doing it the other way was that you could easily switch how the results were displayed, but you could accomplish this anyway just by swapping group
and trace
values in the parameters spreadsheet/CSV.DefinitionBuilder
.group
multiplier and a trace
multiplier. This can be easily fixed in DefinitionBuilder
.SimulationSet
to DefinitionBuilder
and then modifying it accordingly. Some possibilities:
compartment_multipliers
and compartments_max_fraction
from config.yaml
to the parameters sheet.Beta1
and Beta2
fields from estimates. This has been removed, but it could be re-added as a parameter option in the sheet.
From the Data Engineering Roadmap
Currently, GLEAMviz parameters are being set in this collab. But the only parameters that are currently being set are β and seasonality. Lists of possible values are set in configuration variables, and then a matrix of different scenarios is created.
It's somewhat trivial to set other parameters, but the script is set up in a way that makes customization cumbersome. I would be helpful to refactor this function (and potentially
definition.py
) to make fewer assumptions about the different GLEAM traces desired while making it easy to configure scenarios with different sets of parameters.The ideal would be for all parameters to be set from the specified scenarios spreadsheet (example for Pakistan, Gdoc spec).