UT-Covid / episimlab

Framework for development of epidemiological models
https://ut-covid.github.io/episimlab/
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Benchmarks for LCCF Proposal #26

Closed ethho closed 3 years ago

ethho commented 3 years ago

From 6/23/2021 Benchmark Work Plan:

POMP model scaling with time is most important:

  • run model for 1 month and 4 months with the standard parameters
  • fit to 1 and 4 months of data, then project for the standard duration of 3-4 weeks
  • ignore contact matrix size for now; will be important later on

ATX granular model scaling with mixing matrix size is most important:

  1. run model with small mixing matrix and large populations to reflect city-to-city contacts → this mixing matrix would need to be created; we can mock up data for it (doesn’t need to be empirical, just needs to be realistic-ish)
  2. run model with large mixing matrix and small populations to reflect within-city contact by zip code → this mixing matrix has been created from safegraph data
  3. Run for 4 months (first wave dynamics)
ethho commented 3 years ago

As of 26433cd on branch 26-lccf:

Currently Loaded Modules: 1) intel/19.1.1 3) git/2.24.1 5) python3/3.7.0 7) pmix/3.1.4 9) xalt/2.10.13 11) cuda/10.0 (g) 13) cudnn/7.6.2 (g) 15) tacc-singularity/3.7.2 2) impi/19.0.9 4) autotools/1.2 6) cmake/3.20.3 8) hwloc/1.11.12 10) TACC 12) nccl/2.4.7 (g) 14) ooops/1.4

Where: g: built for GPU

$ module load gsl/2.6 $ python3 -m pip install --user --upgrade pip $ python3 -m pip install --user cython numpy==1.20.3 scipy xarray dask "dask[dataframe]" "dask[distributed]" matplotlib xarray-simlab bokeh $ python3 setup.py build_ext --inplace $ PYTHONPATH='.' python3 scripts/20210623_lccf.py

...

DEBUG:root:'intra_city' took 93.72 seconds


- User can set a limited number of parameters via the script CLI: `PYTHONPATH='.' python3 scripts/20210623_lccf.py --help`
- Dask is profiling asynchronous processes and CPU/mem usage. It generates HTML reports using Bokeh and saves them in `logs`
- No [`tacc-stats`](https://github.com/TACC/tacc_stats) profiling yet
ethho commented 3 years ago

LCCF - Perform benchmarking runs

ethho commented 3 years ago

LCCF - Define benchmarking parameters

ethho commented 3 years ago

Added parameter n_cores_partition to script on branch 26-lccf. With the following data files:

$ md5 data/lccf/*.csv
MD5 (data/lccf/census_pop1_rows1.csv) = de88f392822b23ad60955d4acb15fbaf
MD5 (data/lccf/contacts_pop1_rows1.csv) = fb05dc85c3a27617a768cf0efbf176e7
MD5 (data/lccf/travel_pop1_rows1.csv) = 54ff96b54e9cab46f39ca28a9748fbb2

Invoke the script, specifying file paths and 8 threads:

poetry run python scripts/20210623_lccf.py \
    --config-fp scripts/20210625_lccf.yaml \
    --travel-fp data/lccf/travel_pop1_rows1.csv \
    --contacts-fp data/lccf/contacts_pop1_rows1.csv \
    --census-counts-csv data/lccf/census_pop1_rows1.csv \
    --n-cores 8
ethho commented 3 years ago

As of ef36dfe, can now pass a comma separated list of integers for argument --n-cores. Will run the given function once for each integer value in the list:

poetry install
poetry run python scripts/20210623_lccf.py \
    --config-fp scripts/20210625_lccf.yaml \
    --travel-fp data/lccf/travel_pop1_rows1.csv \
    --contacts-fp data/lccf/contacts_pop1_rows1.csv \
    --census-counts-csv data/lccf/census_pop1_rows1.csv \
    --n-cores 1,2,4,8
ethho commented 3 years ago

As of 4e2d8ee, there is an SBATCH script for running the model with 1, 2, 4, 8, 16, and 32 threads on a single-node Frontera job:

sbatch -A COVID19-Portal -p development -t 15 scripts/sbatch/20210624_lccf.sh
tail -f logs/esl_lccf.*
ethho commented 3 years ago

On commit 3b228be (branch 26-dask-debug), Dask profiler points to Dask operations in Partition2Contact. This allows us to profile only this task, and view the forked threads:

bokeh_plot(2)

poetry install
poetry run python scripts/20210623_lccf.py \
    --config-fp scripts/20210625_lccf.yaml \
    --travel-fp data/lccf/travel_pop1_rows1.csv \
    --contacts-fp data/lccf/contacts_pop1_rows1.csv \
    --census-counts-csv data/lccf/census_pop1_rows1.csv \
    --n-cores 32
ethho commented 3 years ago

LCCF benchmarks submitted, closing for now.