Uncertain parameters - Githubissues

gotom22 commented 6 years ago

Map out sensitivity/uncertainty parameters that would we would want to understand (incl. in sensitivity analysis).

JDWoodcock commented 6 years ago

@robj411

gotom22 commented 6 years ago

strikes me as closely related to scenarios #27

JDWoodcock commented 6 years ago

i think this is best kept separate from scenarios

gotom22 commented 6 years ago

ok, maybe similar structural approach as in identifying for which "parameters" we would want to understand what kind of uncertainty information...

markotainio commented 6 years ago

The complete list of uncertain input parameters is likely going to be large. At the moment we can just assume that we will have uncertain input parameters all over the model, and build dataframes so that this uncertainty can be propagated through the model.

Also note that we will likely have model uncertainties, which can be different from parameters uncertainty. For example, what injury model to use for given data.

robj411 commented 5 years ago

So far, we have accommodated uncertainty for the following parameters:

walk-to-bus time
cycling MMETs
walking MMETs
background PM2.5
motorcycle distance relative to car
non-travel PA
non-communicable disease background burden
traffic PM2.5 share
injury reporting rate
day-to-week travel scalar
all-cause mortality (PA)
IHD (PA)
cancer (PA)
lung cancer (PA)
stroke (PA)
diabetes (PA)
IHD (AP)
lung cancer (AP)
COPD (AP)
stroke (AP)

See https://www.overleaf.com/read/mrjtkhffzfzr for details.

markotainio commented 5 years ago

So far, we have accommodated uncertainty for the following parameters:

* walk-to-bus time

* cycling MMETs

* walking MMETs

* background PM2.5

* motorcycle distance relative to car

* non-travel PA

* non-communicable disease background burden

* traffic PM2.5 share

* injury reporting rate

* day-to-week travel scalar

* all-cause mortality (PA)

* IHD (PA)

* cancer (PA)

* lung cancer (PA)

* stroke (PA)

* diabetes (PA)

* IHD (AP)

* lung cancer (AP)

* COPD (AP)

* stroke (AP)

See https://www.overleaf.com/read/mrjtkhffzfzr for details.

Very good start!

When compared to published Sao Paulo paper, following uncertainties are not yet included (and it's not certain if these are relevant in current version):

Injury YLD uncertainty. Is this still relevant?
Risk of injury causing life long injuries. Is this still relevant?
Fraction of PM2.5 emissions from different traffic related emission sources (busses, cars etc.)
Safety in number uncertainty. Is SiN included in calculations?

robj411 commented 5 years ago

Injury YLD & duration uncertainties: does this relate to the extrapolation of injury fatalities to injury YLL? This is a factor we can model as a variable parameter.

Emissions uncertainty we could incorporate in the emissions factors but I don't think the handling of these has been settled yet.

For injury linearity, two parameters have been defined, in the updated list below.

walk-to-bus time
cycling MMETs
walking MMETs
background PM2.5
motorcycle distance relative to car
non-travel PA
non-communicable disease background burden
traffic PM2.5 share
injury reporting rate
day-to-week travel scalar
injury linearity
fraction of injury linearity apportioned to the casualty mode
all-cause mortality (PA)
IHD (PA)
cancer (PA)
lung cancer (PA)
stroke (PA)
diabetes (PA)
IHD (AP)
lung cancer (AP)
COPD (AP)
stroke (AP)

See https://www.overleaf.com/read/mrjtkhffzfzr for details.

markotainio commented 5 years ago

Injury YLD uncertainty have few different elements. One is the extrapolation of number of deaths to YLL (as you point out), but this relates more to fatality side. Also, data for this can be extracted from GBD.

The injury (non-fatal) part is more complicated. First, we need to have number of injuries. This could be total number or divided between mild and serious. From this we then estimate YLD per injury by taking into account that some injuries cause life-long consequences. All this extrapolation are uncertain. However, in the end all depend on how the non-fatal burden of injuries will be estimated in the model.

robj411 commented 5 years ago

OK thanks. I will incorporate this in a new issue as it's not currently included anywhere in the code.

robj411 commented 5 years ago

There are two new sources of uncertainty that are a little less straightforward to parametrise. One has to do with emissions, and the other the non-travel PA data.

The emissions will be represented by a Dirichlet distribution, and the non-travel PA will have two parameters: one scalar for the non-zero values, as before, and a new parameter that varies the proportion of non-zero values by demographic group, each represented by a Beta distribution.

For each case, I propose we supply a confidence value, between 0 and 1, where 1 represents full confidence and we use the raw data as provided. We interpret a value between 0 and 1 to parametrise a distribution. See pdfs for examples of how these could look.

I'd like to know if this seems like a reasonable approach; if not, what our alternatives are; if so, how we'd like the mapping from confidence to distribution to look.

emission_dist.pdf prob_zero_PA.pdf

JDWoodcock commented 5 years ago

this seems reasonable to me

robj411 commented 5 years ago

For VOI analysis, we have the option to group parameters. That is, we assume that if we were to learn one parameter, we would learn another also, so it makes sense to work out the value in learning both together, rather than one at a time.

For example, I assume we learn the whole emission inventory together, and we learn the four AP DR parameters together for a given disease (i.e., we learn the curve of the disease, which is defined by four parameters).

I list below the options, first those that belong to the whole model, and then those that will be specific to each setting.

Model parameters

cycling MMETs
walking MMETs
sum of injury distance exponents
fraction of injury linearity apportioned to the casualty mode
all-cause mortality (PA)
IHD (PA)
endometrial cancer (PA)
breast cancer (PA)
colon cancer (PA)
lung cancer (PA)
all cancer (PA)
stroke (PA)
diabetes (PA)
LRI (AP) (already a group of 4)
IHD (AP) (already a group of 4)
lung cancer (AP) (already a group of 4)
COPD (AP) (already a group of 4)
stroke (AP) (already a group of 4)

Setting-specific parameters

walk-to-bus time
background PM2.5
motorcycle distance relative to car
truck distance relative to car
bus occupancy
non-travel PA non-zeros
non-travel PA scalar
non-communicable disease background burden
traffic PM2.5 share
injury reporting rate
day-to-week travel scalar (we're currently saying this is 7 and not variable)
emission inventory (already a group the size of the number of contributing factors)
motorcycle distance scalar
car/taxi distance scalar
bicycle distance scalar
walking distance scalar
public transport (pt) distance scalar

Groups proposed so far

All the PA DR curves
non-travel PA non-zeros and non-travel PA scalar
PM parameters (background and transport fraction) which are already grouped by their definition

Update

The results for VOI are stored in results/multi_city/.

Simulations, presently using 1024 samples, take ~40 min on 16 cores.

There is sometimes a spurious correlation; this happens particularly for the multivariate emission_inventory with city--scenario combinations that show no change, e.g. Bangalore motorcycle. Perhaps we should omit these calculations entirely.

Still to do:

[ ] Add in groups
[ ] Inflate uncertainty for parameters that might not be relevant to our settings, e.g. PA dose--response curves, active travel mMETs

ITHIM / ITHIM-R

Uncertain parameters #24