alan-turing-institute / vivarium_population_spenser

Vivarium microsimulation tools used to model population evolution with the SPENSER project
GNU General Public License v3.0
3 stars 0 forks source link

Issue with the OD matrices (CSV files) #49

Closed kasra-hosseini closed 4 years ago

kasra-hosseini commented 4 years ago

It seems that M_25to34_OD_matrix_EW.csv is empty. To reproduce:

import numpy as np
import pandas as pd
od_rd = pd.read_csv("./M_25to34_OD_matrix_EW.csv")
print(f"Sum over all elements: {od_rd[od_rd.columns[1:]].to_numpy().astype(np.int).sum()}")

OUTPUT:

Sum over all elements: 1

While normally, the values are:

In [4]: import numpy as np
   ...: import pandas as pd
   ...: od_rd = pd.read_csv("./M_35to49_OD_matrix_EW.csv")
   ...: print(f"Sum over all elements: {od_rd[od_rd.columns[1:]].to_numpy().astype(np.int).sum()}")
Sum over all elements: 500928
ld-archer commented 4 years ago

The problem is that the output for the male 25to34 constraints is a table of probabilities instead of re-weighted counts, so they all sum to 1 and are being replaced by a single '1' in the integerisation step. This is happening because the algorithm is calculating a difference between the totals for in and out migration used as constraints, which should have been handled whilst preparing the constraints before running the IPF. I'll investigate whats happened and rerun IPF when I've solved the issue.

kasra-hosseini commented 4 years ago

@crangelsmith @ld-archer I updated notebook006 to use the new OD matrices: https://github.com/alan-turing-institute/vivarium_public_health_spenser/blob/feature/48-OD-matrices-new/notebooks/pipeline_006_internal_outmigration.ipynb

(@crangelsmith path_to_OD_matrices is now read using builder.data, see this)

However, in some cases, the row sum (i.e., sum over all possible destinations) is zero. This causes some issues when we want to normalize the int_migration_matrix. To solve this, I add 1e-10 to all the elements in a matrix. See: https://github.com/alan-turing-institute/vivarium_public_health_spenser/blob/feature/48-OD-matrices-new/src/vivarium_public_health/population/internal_migration.py#L166

Does this make sense? Is it better to add this value only if the row sum is zero?