UT-Covid / episimlab

Framework for development of epidemiological models
https://ut-covid.github.io/episimlab/
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Fix #48 #49

Closed ethho closed 2 years ago

ethho commented 2 years ago

Fix #48

ethho commented 2 years ago

Updated Math

@kellypierce summary:

excellent. here's my revised math: i = {local source vertices} j = i k = {all local and contextual destination vertices} pr(travel source i -> destination k) = n_ik/n_i pr(from source j and in destination k) = n_jk/n_k pr(someone from i contacts someone from j in destination k) = pr(travel i -> k) pr(j in k) = (n_ik/n_i) (n_jk/n_k) ... with that definition, sources are either i or j and destination is always k

Testing xarray Implementation

First pass method of travel partitioning using DataArrays instead of DataFrame/Dataset:

https://github.com/eho-tacc/episimlab/blob/b471a36be6eae28a228f04c9892b7412b85602b1/episimlab/partition/partition.py#L166-L195

Tested using travel10.csv in new pytest:

https://github.com/eho-tacc/episimlab/blob/b471a36be6eae28a228f04c9892b7412b85602b1/tests/test_partition.py#L153-L180

Remaining To-Do

ethho commented 2 years ago

Single test case of travel9.csv failing due to handling of destination_type

ethho commented 2 years ago

Correct handling of destination_type

ethho commented 2 years ago

Currently, still runs both the Dask/pandas partitioning routine:

https://github.com/eho-tacc/episimlab/blob/1e8b3faa6cd6a980a126ba999c3fc8a8d00c8cfe/episimlab/partition/partition.py#L122-L122

...as well as the new xarray routine:

https://github.com/eho-tacc/episimlab/blob/1e8b3faa6cd6a980a126ba999c3fc8a8d00c8cfe/episimlab/partition/partition.py#L124-L131

ethho commented 2 years ago

Three-process Partition pipeline does partitioning in xarray:

https://github.com/eho-tacc/episimlab/blob/fdffc7d88245630e2934ea40c3b3296f211f895d/episimlab/partition/part_xr.py#L1

As of fdffc7d, no destination_type column is read from the travel patterns CSV file:

https://github.com/eho-tacc/episimlab/blob/fdffc7d88245630e2934ea40c3b3296f211f895d/episimlab/partition/part_xr.py#L171-L182

ethho commented 2 years ago
ethho commented 2 years ago

Used the following script to generate a dummy tests/data/travel_pat0.csv:

import pandas as pd
import numpy as np

rng = np.random.default_rng()
df = pd.read_csv('./data/20200311_travel.csv')
rand_n = np.array([rng.random() * 2 * row for row in df['n']])
rng.shuffle(rand_n)
df['n'] = rand_n
df.to_csv('./tests/data/travel_pat0.csv', index=False)
df.head()
ethho commented 2 years ago

Overdue for clean up. Editing file by file: