HopkinsIDD / cholera-mapping-pipeline

Formerly part of cholera-taxonomy. The map creation scripts, packages, and file structure
1 stars 3 forks source link

NGA 2016-20 #432

Open eclee25 opened 1 year ago

QLLZ commented 1 year ago

Data pull: HASH: 6772ea9da4f43ddc3da11c494a573af2ca3dd3c8 config

QLLZ commented 1 year ago

Model run: HASH: 6772ea9da4f43ddc3da11c494a573af2ca3dd3c8

QLLZ commented 1 year ago

Error in model run: stan_output has been created, but generated_quantity file was not created successfully. I reran this before but the same error was present.

log file

javierps commented 1 year ago

@QLLZ is it possible this is because of memory limitations? What was the memory limit on this job?

QLLZ commented 1 year ago

reran with 32 G HASH 6772ea9da4f43ddc3da11c494a573af2ca3dd3c8

QLLZ commented 1 year ago

country data report

eclee25 commented 1 year ago

Convergence, fits, maps, GAM input comparison, and Rhats look good

Opinion: Approve

javierps commented 1 year ago

Moving this to rejected because of the over-estimation in 2018. This may be due to cells not being covered by full observations. Need to check.

eclee25 commented 1 year ago

Rerun with same config but on the updated dev branch. New feature that clips output genquant shapefiles to adm0 output shapefile was merged

QLLZ commented 1 year ago

Updated config

QLLZ commented 1 year ago

Data pull: HASH 02faaed1a21116cebc29533c68ec0d99fe78d08f

QLLZ commented 1 year ago

Model run: HASH 02faaed1a21116cebc29533c68ec0d99fe78d08f

QLLZ commented 1 year ago

Country data report needs 64 G to run.

QLLZ commented 1 year ago

Country data report

javierps commented 1 year ago

The issue in over-estimation in 2018 is still present. This may be due to the adm0 clipping failing. National level counts seem ok.

Suggestion: Temporary accept.

eclee25 commented 1 year ago

Do we think the overestimation might be from the high eastern border cell? Perhaps we try with higher sfrac border thresh and the fixed adm0 clip

Temp Accept

QLLZ commented 1 year ago

Config

Data pull(32G):

HASH: 211913e0eb7d570ed62c48bc1ae9ecd6671a6cbc

QLLZ commented 1 year ago

Model run:

HASH: 211913e0eb7d570ed62c48bc1ae9ecd6671a6cbc

QLLZ commented 1 year ago

Failed model run log file

QLLZ commented 1 year ago

Rerun on dev_u_combs_fix

HASH: e56580fa5ddab00293e31ff90139351f80ae3f6c

QLLZ commented 1 year ago

country data report

QLLZ commented 1 year ago

There are many observations with conflicted number of cases 2017: OC332 versus OC 20810/20902 2018: maybe using a higher censoring threshold can help 2019: OC 21138 seems suspicious (under same location period, different cases. I guess we didn't change the primary to FALSE). need to double check 2020: OC 332 is much smaller than other OCs.

Convergency looks good. Population estimates look good.

eclee25 commented 1 year ago

Model diagnostics look ok but agree with Qulu that we need to audit some of the OCs due to apparent conflicts and weird observations. The following should be audited:

2017 issues

2018 issues

2019 issues

2020 issues

Opinion: Rerun if obs get changed in the audit. Could also consider increasing the censoring threshold

QLLZ commented 1 year ago

2017: except for 21138, the others are from weekly bulletin on outbreaks and other emergencies report (weekly updates) OC 20810: data is consistent with the source doc OC 20902: data is consistent with the source doc OC 20747: data is consistent with the source doc OC 20729: data is consistent with the source doc OC 20737: data is consistent with the source doc OC 20713: data is consistent with the source doc OC 20721: data is consistent with the source doc OC 20766: data is consistent with the source doc OC 21138: data is consistent with the source doc

2018: OC 20985: Updated observations - it seems to be that these cases were reported in three states in nigeria not nationwide.

2019: OC 21138: some observations were actually non-primary (stratified by sex/age groups). updates those. And for other observations, there are some conflicted data in the source (between figures). but data is consistent.

eclee25 commented 1 year ago

Rerun standard settings due to data audit

QLLZ commented 1 year ago

Data pull & model run: HASH: 5c0213fae8693e4b4b84d88f0fa406548414b48d

Update country data report

QLLZ commented 1 year ago

OC 20985 failed to be updated. need to update this and rerun.

eclee25 commented 1 year ago

Rerun with standard model settings - no other convergence or estimation issues need to be resolved with the model.

QLLZ commented 1 year ago

Model run & data pull:

HASH: 5c0213fae8693e4b4b84d88f0fa406548414b48d dev

eclee25 commented 1 year ago

All model diags look good after the data audit. Approve

javierps commented 1 year ago

Sep 2023 Production run: all OK.

Under-estimation of cases in 2017 and 2018 due to conflicting national level observations. Consider higher tfrac thresh?

Suggestion: accept.

eclee25 commented 1 year ago

If we believe that the larger case count observations are more true, I think we would need to make a different choice in the data filtering process. The estimation here looks okay given the variability in reporting across OCs.

Decision: Rerun standard settings but with a censoring threshold of 1 due to conflicting annual observations with shorter tfracs but considered full (mistakenly put tfrac threshold of 1 previously)

eclee25 commented 1 year ago

Censoring threshold = 1 run is Approved.