UT-Covid / episimlab

Framework for development of epidemiological models
https://ut-covid.github.io/episimlab/
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Partition ingests age_group coordinates #22

Closed ethho closed 3 years ago

ethho commented 3 years ago

The Partition process ingests coordinates for age_group: https://github.com/eho-tacc/episimlab/blob/0fb7dc5c2ab0592a69f4d3c2b83455e97234f784/episimlab/partition/partition.py#L69

...and uses it once when iterating in Partition.probabilistic_partition: https://github.com/eho-tacc/episimlab/blob/0fb7dc5c2ab0592a69f4d3c2b83455e97234f784/episimlab/partition/partition.py#L171

Objective

Justification

In our preliminary models that use the Partition process, it is necessary to use contact_xr to define the coordinate set of the simulation, not the other way around.

ethho commented 3 years ago

@kellypierce What is the best way to generate age_group from travel_df and contacts_df? Would self.travel_df.age.unique() work, or should we join on age groups in contacts_df as well?

ethho commented 3 years ago

To get unique age groups in the contact_df:

pd.unique(self.baseline_contact_df[['age1', 'age2']].values.ravel('K'))

To get union of age groups in both travel and contacts df:

ag_from_contacts = set(self.baseline_contact_df[['age1', 'age2']].values.ravel('K'))
ag_from_travel = set(self.travel_df.age.unique())
ag_from_contacts.union(ag_from_travel)
# {'old', 'young'}
kellypierce commented 3 years ago

@kellypierce What is the best way to generate age_group from travel_df and contacts_df? Would self.travel_df.age.unique() work, or should we join on age groups in contacts_df as well?

The partitioned version of travel_df is joined to contacts_df here: https://github.com/eho-tacc/episimlab/blob/0fb7dc5c2ab0592a69f4d3c2b83455e97234f784/episimlab/partition/partition.py#L232

Currently that's an outer join, so if there are age groups in contacts_df that aren't in travel_df they will be carried over (with zeros for contacts). Because of that, the age_group should be the set of unique ages from travel_df.age, contact_df.age1 and contact_df.age2. (Alternatively we could implement a different join, but for now I like the idea of preserving all the age groups... could be a reasonable catch if you see missing data in the model output for an age group you thought you'd specified)

ethho commented 3 years ago

could be a reasonable catch if you see missing data in the model output for an age group you thought you'd specified)

That's what I was thinking. Implemented on #23 and merged to main.