ejwillemse / sim-demog-NUK-open

Adaption of the sim-demog model to Singapore
0 stars 0 forks source link

Long model execution time #5

Open ejwillemse opened 5 years ago

ejwillemse commented 5 years ago

The model iterates through each individual and then uses probabilities to determine what events, if any will happen to the individual. In python, this is quite inefficient. For populations in excess of 100'000, the model becomes slow.

A better approach is to use pandas dataframes or numpy to speed up the process. Currently the logic is:

import pandas as pd
import numpy as np

max_age = 100
n_pop = 100
death_prop_table = pd.DataFrame(data = {'age': range(max_age), 'death_prop': np.array(range(max_age)) / max_age})

pop = pd.DataFrame(data = {'age': np.random.randint(low=0, high=100, size=n_pop)})

die_index = []
for i in range(n_pop):
    ind_age = pop.loc[i, 'age']
    p_die = death_prop_table.loc[death_prop_table.age == ind_age, 'death_prop'].values[0]
    if p_die >= np.random.uniform(low=0, high=1, size=1):
        die_index.append(i)
print(len(die_index))

More efficient is:

import pandas as pd
import numpy as np

max_age = 100
n_pop = 100
death_prop_table = pd.DataFrame(data = {'age': range(max_age), 'death_prop': np.array(range(max_age)) / max_age})

pop = pd.DataFrame(data = {'age': np.random.randint(low=0, high=100, size=n_pop)})

pop_death_prop = pop.merge(death_prop_table, how='inner', on='age')
r_death = np.random.uniform(low=0, high=1, size=n_pop)
pop_dead = pop_death_prop.loc[pop_death_prop.death_prop >= r_death]
print(len(pop_dead))