BDI-pathogens / OpenABM-Covid19

OpenABM-Covid19: an agent-based model for modelling the spread of SARS-CoV-2 (coronavirus) and control interventions for the Covid-19 epidemic
GNU General Public License v3.0
111 stars 76 forks source link

Add function for returning individuals in a dataframe via Python #190

Closed p-robot closed 3 years ago

p-robot commented 3 years ago

Add function get_alive() to Python interface which returns a pandas data frame of all alive individuals.
This is an approach to solving #189.
Test added to compare the dataframe returned with that in the individual file.
shortArray type added to SWIG file to access vaccine_status.

p-robot commented 3 years ago

I'll add a more thorough test of this (assert across multiple time points), such as the following (passes locally for me):

    def test_get_alive( self, test_params ):
        """
        Test that a dataframe of alive individuals is concordance with the individual/trans files
        """

        n_total = test_params["n_total"]
        end_time = test_params["end_time"]

        params = utils.get_params_swig()
        for param, value in test_params.items():
            params.set_param( param, value )
        model  = utils.get_model_swig( params )

        df_alive_list = list()

        # Simulate for long enough for there to be some COVID-19 related mortality
        for time in range(end_time):
            # Every 5 years, save the dataframe of alive individuals
            if time % 5 == 0:
                df_alive_list.append(model.get_alive())

            model.one_time_step()

        # Write and read individual and transmission files
        model.write_individual_file()
        model.write_transmissions()

        # Find those alive and convert to numpy array
        df_trans = pd.read_csv(constant.TEST_TRANSMISSION_FILE)
        df_indiv = pd.read_csv(constant.TEST_INDIVIDUAL_FILE)
        df_indiv = pd.merge(df_indiv, df_trans,
            left_on = "ID", right_on = "ID_recipient", how = "left")

        df_indiv.time_death = df_indiv.time_death.fillna(-1)
        df_alive_indiv = df_indiv.loc[
            df_indiv.time_death == -1, 
            ["ID", "current_status", "age_group", "occupation_network", "house_no",
                "infection_count", "vaccine_status"]
        ]
        # Check some individuals have died
        np.testing.assert_equal(model.one_time_step_results()["n_death"] > 0, True)

        cols2compare = ["ID", "age_group", "occupation_network", "house_no"]

        # Every 5 years, check the number of individuals alive is consistent across
        # both approaches
        for time in range(end_time):
            if time % 5 == 0:
                # Find alive individuals using the get_alive() method, convert to np array
                df_alive_t = df_alive_list[time//5][cols2compare]
                array_alive = df_alive_t.to_numpy()

                # Find alive individuals using the individual+transmission file, convert to np array
                df_alive_indiv = df_indiv.loc[
                    (df_indiv.time_death > time) | (df_indiv.time_death == -1), cols2compare]
                array_alive_indiv = df_alive_indiv.to_numpy()

                np.testing.assert_array_equal(array_alive, array_alive_indiv)

Happy to adjust this PR or can make a new PR for this minor change afterwards.

roberthinch commented 3 years ago

Looks good, but should we add it to the R interface as well now it is ready to go?

roberthinch commented 3 years ago

Should it be just get_alive or should it be get_individuals and return everyone including the dead? The number of dead people is certainly small so memory difference is small between the 2 and there are situations where seeing everyone would be useful. It is one line of pandas to filter out the dead.

p-robot commented 3 years ago

I've updated the function to be get_individuals instead. The test outlined above is included as it checks multiple time steps.