bertozzivill / map-itn-cube

Geospatial insecticide-treated net models for the Malaria Atlas Project
GNU General Public License v3.0
0 stars 2 forks source link

map-itn-cube

NOTE: This branch is under active development. To view the branch used for the 2021 research paper, switch to branch publication-2021.

The code in this repo generates estimates of insecticide treated net (ITN) ownership, access, and use in sub-Saharan Africa.

Stock and Flow (stock_and_flow)

Mechanistic model fit in rjags to estimate country-specific time series of ITN distribution and retention. Todo: script-specific descriptions.

01a_prep_hh_survey_data.r

Clean household-level survey data; save for ITN cube; aggregate to national level for stock and flow.

name type description location
main_dir input Location of manually-extracted survey data (mostly MICS). See README in folder for more info. ~/stock_and_flow/input_data/00_survey_nmcp_manufacturer/household_surveys
dhs_dir input Location of DHS data extracted by MAP team. ~/../Shared Drives/dhs-outputs/Standard_MAP_DHS_Outputs/DHS_ITN_Data/Output/[DATE]/standard_tables
code_dir input Location of repo. map-itn-cube/stock_and_flow
out_dir output Output directory. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
for_cube output Household-level data file to use in the ITN cube step, later. Contains geolocated data ONLY. out_dir/itn_hh_survey_data.csv
hh_size_props output Household size distribution (1-10+ people) for use in the crop-to-access conversion. out_dir/hhsize_from_surveys.csv
summary_table output Descriptor to track survey summary stats. out_dir/summary_tables/summary_table_raw.csv
all_data output Household-level data file INCLUDING non-geolocated points. out_dir/itn_hh_data_all.csv
survey_summary output Aggregated survey data. This is the main file that feeds into the next step! out_dir/itn_aggregated_survey_data.csv

01b_prep_reportonly_survey_data.r

Append those surveys whose results are only available in aggregated form to the stock and flow dataset.

name type description location
main_dir input Location to which data in the previous step was saved. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
input_dir input Location of survey data that contains only survey-level information extracted from reports. ~/stock_and_flow/input_data/00_survey_nmcp_manufacturer
survey_count output Small dataset tracking the number of surveys per country, relevant for sensitivity analysis. main_dir/survey_count.csv
for_tsv output Dataset used to organize surveys and countries for sensitivity analysis; not important for main model run. main_dir/batch_sensitivity.tsv
summary_table output Descriptor to track survey summary stats (appended to version from prior step). main_dir/summary_tables/summary_table_intermediate.csv
survey_data output Aggregated survey data, including report-only surveys. This is the main file that feeds into the next step! main_dir/itn_aggregated_survey_data_plus_reportonly.csv

01c_prep_indicator_priors.r

Run a regression to find coefficients for the "proportion of households with no net" and "mean nets per household" metrics.

name type description location
main_dir input Location to which data in the previous step was saved. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
HH input Household-level data ("for_cube" in step 01a). POSSIBLE BUG: should this use "all_data" from step 01a instead? main_dir/itn_hh_survey_data.csv
all_outputs output Samples from the posterior distributions of the Stan model, used as priors in step 03. main_dir/indicator_priors.csv

01d_calculate_use.r

Run a regression to find coefficients for the access-to-use conversion used in the World Malaria Report. Use is calculated overall, among children under 5, and among pregnant people.

name type description location
main_dir input Location to which data in the previous step was saved. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
code_dir input Location of repo. map-itn-cube/stock_and_flow
survey_data input Aggregated survey data ("survey_summary" in step 01a). main_dir/itn_aggregated_survey_data.csv
all_data input Household-level survey data ("all_data" in step 01a). main_dir/itn_hh_data_all.csv
all_traces output Samples from the posterior distributions of the Stan model, used to calculate use from access for 3 demographic groups in step 03. main_dir/access_use_relationship.csv

02_prep_delivery_dist_data.r

Clean and format ITN delivery and ITN distribution data.

name type description location
main_dir input Location of net delivery and distribution data from WHO. ~/stock_and_flow/input_data/00_survey_nmcp_manufacturer/nmcp_manufacturer_fromwho/data[YEAR]
out_dir output Output directory. Variable
these_distributions output Dataset of all LLIN distributions--descriptive. out_dir/preppedllins[DATE].csv
new_nmcp output these_distributions, reformatted to resemble the original NMCP data. Used in step 03. out_dir/itn_distributions.csv
new_manu output Dataset of manufacturer deliveries. Used in step 03. out_dir/manufacturer_deliveries.csv

03_stock_and_flow.r; jags_functions.r

Run and save results for the stock and flow model; run separately for each country.

name type description location
main_dir input Location of prepped survey data. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
nmcp_manu_dir input Location of cleaned net delivery and distribution data from WHO. Variable, but e.g. stock_and_flow/input_data/00_survey_nmcp_manufacturer/nmcp_manufacturer_from_who/data_2020/20200929/ready_for_stockflow
out_dir input Output directory. ~/stock_and_flow/results/[DATE_UNIQUELABEL]
code_dir input Location of repo. map-itn-cube
this_country input ISO3 code of country for which to run the model.
sensitivity_survey_count; sensitivity type input Needed only for sensitivity analysis-- parameters for what survey data to fit to.
start_year input integer year, usually 2000
end_year input integer year, the latest year for which there is data.
projection_year input integer year. Any time later than this year will receive different assumptions about the variability of net distribution data. Only used for projection scenarios, not typical runs.
full_model_string output text file of JAGS model code. out_dir/[ISO3]_model.txt
time_df output small .csv to track how long models take to run. out_dir/[ISO3]_time.csv
final_metrics output Posterior draws of ITN access to pass on to the next steps. out_dir/[ISO3]_accessdraws.csv
R environment output Save all items in the R environment to use in later steps. out_dir/[ISO3]_all_output.RData

04_compare_outputs.r

View results of stock and flow compared to earlier model versions.

name type description location
base_dir input Location where stock and flow results are saved. Not called "main dir" to avoid being overwritten when stock and flow results are loaded ~/stock_and_flow/results
func_dir input Location of repo. map-itn-cube/stock_and_flow
out_dir input Output directory. ~/stock_and_flow/results/[DATE_UNIQUELABEL]
plot_dir input Location to save comparison plots ~/stock_and_flow/results/[LABEL]
model_dirs input vector of unique model result labels to compare. Must be at least length two, I don't recommend more than four. e.g. c("20200418_BMGF_ITN_C1.00_R1.00_V2", "20200418_BMGF_ITN_C1.00_R1.00_V2")
nets_in_houses_all, survey_data_all, nmcp_data_all, stock_all, half_life_comparison output various output datasets useful for later plotting. plot_dir/for_plotting.RData
timing_all output aggregation of model runtime by country. plot_dir/timing_all.csv
compare_outputs pdf output Time-series comparisons of different models. plot_dir/compareoutputs[label1][label 2]...pdf
compare_half_lives pdf output Comparisons of different model ITN retention half-lives. plot_dir/halflives[label1][label 2]...pdf

05_aggregate_for_cube.r

Collect national-level outputs to pass along to the itn_cube code.

name type description location
reference_dir input Location where stock and flow results are saved. Not called "main dir" to avoid being overwritten when stock and flow results are loaded. ~/stock_and_flow/results/[UNIQUE LABEL]
list_out_dir input Output directory, same as reference_dir. ~/stock_and_flow/results/[UNIQUE LABEL]
metrics_for_cube output Draw-level access metrics (NPC, probability of not having a net, and nets per household) for cube. out_dir/for_cube/stock_and_flow_by_draw.csv
means_for_cube output Mean access metrics (NPC, probability of not having a net, and nets per household) for cube. out_dir/for_cube/stock_and_flow_probs_means.csv
national_access output Mean national access and NPC for cube. out_dir/for_cube/stock_and_flow_access_npc.csv

06_aggregate_for_wmr.r

Calculate national and continental ITN indicators used in the World Malaria Report.

name type description location
reference_dir input Location where stock and flow results are saved. Not called "main dir" to avoid being overwritten when stock and flow results are loaded. ~/stock_and_flow/results/[UNIQUE LABEL]
list_out_dir input Output directory, same as reference_dir. ~/stock_and_flow/results/[UNIQUE LABEL]
code_dir input Location of repo. map-itn-cube
wmr_input_dir input Location of outputs from step 01d. ~/stock_and_flow/input_data/01_input_data_prep/[DATE]
indicator_summary output Quarterly values of all indicators calculated in script. list_out_dir/for_cube/indicators_all.csv
wmr_subset output National, annual values of indicators. This output file is given directly to WHO. list_out_dir/for_cube/indicators_for_wmr.csv

07_half_life_convergence.r

Generate plots to assess JAGS model fit based on the convergence of the half-life parameter; also save uncertainty intervals for half-life. Diagnostic purposes only.

08_analyze_sensitivity.r

For sensitivity analysis model runs, aggregate and plot results. Not usually used.

ITN "cube" (itn_cube)

Geospatial regression model fit in R-INLA that utilizes the national mean from the stock and flow outputs as a baseline for disaggregating spatially.

000_make_dsub.r:

Construct the long bash command used to collate and save the desired covariates (see covariate_key.csv) from the COVARIATE bucket.

name type description location
various input Various directories and google cloud specifications
full_dsub_str output dsub command in the form of a string to paste into a google cloud console

000_extract_covariates.r

Collate and save covariates from the COVARIATE bucket using the dsub command constructed above.

No table since this should only be run via 000_make_dsub.

00_generate_cube_master.r

This is the script that gets submitted for a full run of the cube. Loads all input data and runs steps 1-3. Step 4 needs to be run separately to paralleleize correctly.

name type description location
input_dir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
cov_dir input Location of cleaned covariates from step 000. ~/itn_cube/results/covariates/[COV_DATE]
func_dir input Location of repo. map-itn-cube/itn_cube/
main_dir input Location to save all results. ~/itn_cube/results/[UNIQUE_LABEL]
survey_indir input Location of household-level survey data. stock_and_flow/input_data/01_input_data_prep/[DATE]
indicators_indir input Location of stock and flow outputs to use. ~/stock_and_flow/results/[STOCKFLOW_LABEL]/for_cube

01_prep_data.r; supported by 01_data_functions.r

Load household-level survey data cleaned in the stock and flow code, calculate cluster-level access, aggregate to the 5km-by-5km pixel level.

name type description location
main_indir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
func_dir input Location of repo. map-itn-cube/itn_cube/
survey_indir input Location of household-level survey data. stock_and_flow/input_data/01_input_data_prep/[DATE]
indicators_indir input Location of stock and flow outputs to use. ~/stock_and_flow/results/[STOCKFLOW_LABEL]/for_cube
main_outdir input Location to save all results. ~/itn_cube/results/[UNIQUE_LABEL]
survey_summary output Descriptive; summary stats of all surveys. main_outdir/01_survey_summary.csv
final_data output Prepped household-level data for regression. main_outdir/01_survey_data.csv

02_prep_covariates.r

Subset full covariate set down to only those needed for model fitting; merge onto survey data.

name type description location
main_indir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
cov_dir input Location of cleaned covariates from step 000. ~/itn_cube/results/covariates/[COV_DATE]
main_outdir input Location to save all results. ~/itn_cube/results/[UNIQUE_LABEL]
data output Prepped household-level data with covariates appended for regression. main_outdir/02_data_covariates.csv

03_regress.r; supported by 03_inla_functions.r

Run regression (including appropriate data transformations) and save outputs.

name type description location
input_dir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
func_dir input Location of repo. map-itn-cube/itn_cube/
main_indir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
main_outdir input Location to save all results. ~/itn_cube/results/[UNIQUE_LABEL]
start_year input Integer year to begin regression (usually 2000).
end_year input Integer year to end regression.
save_uncertainty input Set to F when debugging to avoid very large output files.
nsamp input Integer number of posterior draws to sample.
data output Final dataset that goes into regression main_outdir/03_data_for_model.csv
inla_outputs output Large .RData file containing the regression objects for all three model runs. main_outdir/03_inla_outputs.Rdata
inla_outputs_for_prediction output Small .RData file to use when you want to predict only mean outcomes. main_outdir/03_inla_outputs_for_prediction.Rdata
inla_posterior_samples output Medium .RData file that saves posterior samples for prediction in step 04. main_outdir/03_inla_posterior_samples.Rdata

04_predict_rasters.r; 04_prediction_functions.r; 04_batch_submit_predictions.r

Recently modified to run separately for each year. Predict outputs on the monthly level; save national and continental aggregation of monthly time series; aggregate rasters to the annual level and save; calculate exceedance and relative uncertainty.

name type description location
this_year input Integer year to predict values.
input_dir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
func_dir input Location of repo. map-itn-cube/itn_cube/
main_indir input Location of miscellaneous input data (iso-to-gaul names, etc). ~/itn_cube/input_data
main_outdir input Location to save all results. ~/itn_cube/results/[UNIQUE_LABEL]
indicators_indir input Location of stock and flow outputs to use. ~/stock_and_flow/results/[STOCKFLOW_LABEL]/for_cube
static_cov_dir input Location of cleaned static covariates from step 000. ~/itn_cube/results/covariates/[COV_DATE]/static_covariates.csv
annual_cov_dir input Location of cleaned annual covariates from step 000. ~/itn_cube/results/covariates/[COV_DATE]/annual_covariates.csv
dynamic_cov_dir input Location of cleaned monthly covariates from step 000. ~/itn_cube/results/covariates/[COV_DATE]/dynamiccovariates/dynamic[YEAR].csv
testing input Set to T to reduce dataset size dramatically if testing/debugging.
nat_level output Dataset of monthly national-level time series for all outputs main_outdir/04_predictions/aggregated/aggregatedpredictions[YEAR].csv
annual_summary_stats output This dataset of annual pixel-level results gets transformed into annual rasters and saved. main_outdir/04_predictions/rasters

05_relative_gain.r

Calculate relative effect of increasing access vs increasing use.

Not used in a typical model run; for publication only.

view_changes.r

Compare versions of cube outputs to each other.

name type description location
new_dir input Location to which new results are saved. ~/itn_cube/results/[NEW_LABEL]/
old_dir input Location to which old results are saved. ~/itn_cube/results/[OLD_LABEL]/
func_dir input Location of repo. map-itn-cube/itn_cube/
out_path input Output directory. new_dir/04_predictions/view_changes.pdf

Data Versioning

Each of the following types of input data receives separate, dated labels:

Household survey data

This includes three components:

In addition, this analysis includes some older surveys (DHS and MICS 3 and 4) extracted by Bonnie Mappin that I could not find matches for in the current DHS framework.

This data is cleaned in the 01a_prep_hh_survey_data.r and 01b_prep_reportonly_survey_data.r scripts, and would benefit substantially from streamlining.

Net stock and flow data

Updated approximately annually when new data is recevied from WHO. Cleaned using 02_prep_delivery_dist_data.r (requires customization)

Covariates

Updated relatively rarely; Extracted using 000_make_dsub.r and 000_extract_covariates.r in the itn_cube folder

Output versioning

Every time a stock and flow or ITN cube model is run, outputs are saved in a dated and uniquely labeled folder in the appropriate spot on the google cloud buckets. Older versions are not deleted to serve as comparison points.