More customizable GLEAMviz definitions

wolverdude commented 4 years ago

Currently, GLEAMviz parameters are being set in this collab. But the only parameters that are currently being set are β and seasonality. Lists of possible values are set in configuration variables, and then a matrix of different scenarios is created.

It's somewhat trivial to set other parameters, but the script is set up in a way that makes customization cumbersome. I would be helpful to refactor this function (and potentially definition.py) to make fewer assumptions about the different GLEAM traces desired while making it easy to configure scenarios with different sets of parameters.

The ideal would be for all parameters to be set from the specified scenarios spreadsheet (example for Pakistan, Gdoc spec).

wolverdude commented 4 years ago

@lagerros I created this based on our conversation Monday. The specification is a bit vague (could just be my memory being fuzzy), but I believe this is what you wanted. Is there a person on the modeling team I can contact for nailing down the acceptance criteria?

lagerros commented 4 years ago

Check out the "Multiple parameter spec" tab. https://docs.google.com/spreadsheets/d/1IxPMadPxjnphWSKG_6PxmsrCLoXe3cHGp1Ok9kcddPk/edit#gid=1831691945

There's a new column with a toggle where you can select the parameter you want to change, for each row.

I don't know how definition.py works, so I don't know the extent to which refactoring is needed to enable this.

Some nuances:

1) Parameters for "Seasonality" and "Airline traffic" can only have one value throughout the entire simulation and all locations. So they can only be set once for each "Class" in the spreadsheet, and there's no need to provide a location or start date & end date. (Whereas beta, epsilon, mu and imu can vary between locations and times within a single simulation.)

2) Here's a flag for a potential upcoming feature, though it is as of yet uncertain. Currently there's a distinction between:

"Classes" (in other places called "Groups" or "Scenarios) corresponding to the different sidebar buttons you can click on the site
"Traces" corresponding to the different lines drawn within each scenario

At the moment, "Traces" are specified in the colab:

However, this means that all trace settings are taken as a Cartesian product with all "Classes" in the spreadsheet. In future, we might want to allow defining different traces for each Class, from within the spreadsheet. (For example, you might want to say that in the "Strong" scenario we visualise uncertainty over incubation period; whereas in the "Weak" scenario we visualise uncertainty over Seasonality.)

Enabling this functionality requires thinking through what the best implementation would be, and I haven't done that yet. Suggestions welcome.

Is the above clear enough as a spec?

If there are any questions, I believe asking Elizabeth ( @AcesoUnderGlass ) could help with a lot of them.

wolverdude commented 4 years ago

Okay, so for this spec, we do not want to change the BETA_MULTIPLIERS, SEASONALITIES cartesian product, but this may happen in the future.

What we do want for this spec is to implement all the options in that sheet except for the Trace column. The thing that remains unclear to me is how to resolve overlapping values for the same parameter. Is it just first/last wins? Is there a hierarchy based on region? Also, it seems likely that we'll want different combination rules for different kinds of parameters (e.g. monsoon should act like seasonality and modify the betas for everything else within its window).

Other questions:

Are start/end dates inclusive or exclusive?
What is the importance of the Type column? How should it be used when compiling definitions? Should we be able to support new types in the future?
Should I validate that there are only 4 classes?

I think it makes sense to implement this spec before #443, since you'll need some kind of interface to configure monsoons, and once #443 is implemented, we can just make it another kind of settable parameter.

lagerros commented 4 years ago

Overlap -- based on which one is listed highest in the spreadsheet/listed first in Gleam
Date inclusion -- don't know and it doesn't matter much
Type -- explained here, used for Cartesian products of scenarios (which is different from Cartesian product of traces)
There can be more or less than 4 classes!
Monsoons apply only in some regions

hnykda commented 4 years ago

Is this still "blocked" or can be worked on?

wolverdude commented 4 years ago

Sorry, I forgot to update the card state. I should have a PR up tomorrow.

wolverdude commented 4 years ago

@lagerros I've updated the example spreadsheet with examples of what you can now configure. Please review and let me know if this is what you had in mind and what tweaks, if any, you would like me to make.

lagerros commented 4 years ago

@wolverdude I took a look -- this is really neat!

Especially nice is that having all those parameters there would mean different simulations could just be stored in different spreadsheets; and there would be no need to edit the colab. (Which is an annoying feature at the moment)

One thing I'm confused about is the "Background conditions" -- are they used in a Cartesian product to generate traces? Or do they generate groups? (C.f. the distinction in my message above)

wolverdude commented 4 years ago

Countermeasure package classes correspond to different scenarios/groups, one group per class. Background condition classes correspond to different Traces, one per class. Or if you want, it can be the other way around. Easy to change.

I basically copied the logic from the colab, so the behavior shouldn’t be much different. The one thing I did differently was lumping beta multipliers in with background conditions. I set it up this way based on what I was seeing on the current Balochistan site. If that’s undesirable, I can create a third Type for another Cartesian product.

wolverdude commented 4 years ago

One question I have for you (and the thing that kinda derailed me earlier):

Does this spreadsheet config supersede the scenarios and groups keys in the config.yaml? If not, then what should the info in config.yaml be used for?

wolverdude commented 4 years ago

@lagerros any thoughts on these? ^

lagerros commented 4 years ago

Does this spreadsheet config supersede the scenarios and groups keys in the config.yaml? If not, then what should the info in config.yaml be used for?

I don't know what you mean by "supersede".

But basically, the group keys in config.yaml should be entirely determined by what's in the spreadsheet (as should the names of traces in the legend).

Being able to make that easier; for example by downloading a config file from the colab, or something else, is on the data engineering roadmap and could be very helpful.

wolverdude commented 4 years ago

Yeah, "supersede" means replace. The thing is, the functionality in the spreadsheet could replace both the scenarios and groups keys in the config, except for the names and descriptions.

I think what I'll do is have the new colab export the config or something.

wolverdude commented 4 years ago

Okay, I think I've got it. How about we just use "Group" and "Trace" as the values for the Type column, which should make it super clear how things are getting displayed. All parameters will be configured in the spreadsheet, but display names, etc. need to be placed in the config file for export. It would look like this:

scenarios:
  config_sheet: "https://docs.google.com/spreadsheets/d/abc/edit#gid=123"
  groups:
    - name: Weak Mitigation
      description: Mostly open borders; full opening of public places; no social distancing; little compliance with hygiene advice
    - name: Moderate Mitigation
      description: Some border closure; closure of schools; ban on public gatherings; partial social distancing
    - name: Strong Mitigation
      description: Strong external and internal border closure; full closure of public places (including places of worship); social distancing outside and within homes
    - name: Recommended Mitigation
      description: Moderate measures + compulsory masks; contact tracing; social distancing at places of worship and within homes
  traces:
    - name: "Slowest"
      description: Very slow spread (50%)
    - name: "Slower"
      description: Slower spread (75%)
    - name: "Expected"
      description: Expected spread (100%)
    - name: "Faster"
      description: Faster spread (150%)
    - name: "Fastest"
      description: Very fast spread (175%)

name values correspond to the values in the Class column, and description is what's displayed to the user, though for groups, name will also be the title of the tab for users to click on. If we want to decouple that, it would be pretty easy to add an optional display_name to the config.

lagerros commented 4 years ago

Note that this assumes groups and traces are independent -- and I don't think that's assumed in the spreadsheet. (Feature, not bug)

This feature is not used at the moment; but might become useful in future.

But this might be fine as an MVP, and we can jump off that bridge when we get there (unless you can already seen an easy fix for this now).

wolverdude commented 4 years ago

@AcesoUnderGlass @lagerros This issue was completed by epimodel#57; however, there are some design decisions and areas of the spec I never fully clarified. These are things you may consider changing.

Unused Code

There is some code that I wrote and checked in that ended up not getting used and I neglected to remove it.

colabutils.py - This was added to enable fetching gleam_parameters directly from a Google sheet, but I scrapped that functionality in order to conform to luigi expectations. You could convert the GleamParameters task into a non-manual task that pulls the sheet data and saves it as a CSV. Otherwise, there's a couple of dependencies (gsheet, oauth2client) that are only used by this file which should be removed with it.
Foretold substitutions in ParseInput - This is the feature that would fetch the mean of any foretold distribution whose UUID is listed in the Value column. I was unable to add ergo as a dependency due to incompatibilities, so I removed this from the luigi task, but I didn't remove any of the underlying code. This could be easily fixed by using the Foretold API directly instead of relying on ergo, which we're already doing in the UpdateForetold task.

Gleam Parameters

The code uses types group and trace for the cartesian product instead of Countermeasure package and Background condition which were in the original spec. I did this because it was clearer about what was happening. The only advantage I could see of doing it the other way was that you could easily switch how the results were displayed, but you could accomplish this anyway just by swapping group and trace values in the parameters spreadsheet/CSV.
Gleam exceptions are currently being grouped by region and start/end date, but they probably shouldn't because this may not preserve order, which can be important. This should be fairly simple to change in DefinitionBuilder.
If there are two multipliers of the same type, an error is raised. You may instead want them to compound each other so you could have a group multiplier and a trace multiplier. This can be easily fixed in DefinitionBuilder.
The groundwork has now been laid for implementing a monsoon parameter.
There is currently no interaction between estimates and parameters. This could be changed by moving the estimate logic from SimulationSet to DefinitionBuilder and then modifying it accordingly. Some possibilities:
- Move compartment_multipliers and compartments_max_fraction from config.yaml to the parameters sheet.
- The old code had options to use the Beta1 and Beta2 fields from estimates. This has been removed, but it could be re-added as a parameter option in the sheet.
- Enable estimate multipliers that change the number of people assumed infectious.

epidemics / covid

More customizable GLEAMviz definitions #444

Unused Code

Gleam Parameters