Revenue-Academy / OG-IND

Overlapping Generations Model for India
https://Revenue-Academy.github.io/OG-IND
Other
1 stars 4 forks source link

Add informal sector, update earnings ability matrix #17

Closed jdebacker closed 1 year ago

jdebacker commented 1 year ago

This PR updates the OG-IND calibration in two ways.

  1. It approximates the income profiles of Indian workers by rescaling the profiles estimated for US workers in OG-USA to match the Gini coefficient for the distribution of income in India (which is less unequal than that in the USA). The results for India (with 7 ability types): India_ability_profiles

And the analogous figure for the US: USA_ability_profiles

  1. The default calibration is modified to include 2 sectors: formal and informal. With the informal sector representing approximately 53% of GDP, as suggested by the citation in Issue #15.
jdebacker commented 1 year ago

@rickecon I added some unit tests to this repo, including of demographics.py. Four of the demographics tests are failing and I don't understand why. The failures show a bad index, but the demographics.py file runs so I'm not sure why this is happening in the unit tests. Any ideas?

Here's the traceback:

============================= test session starts ==============================
platform darwin -- Python 3.9.15, pytest-7.2.1, pluggy-1.0.0
rootdir: /Users/jason.debacker/repos/OG-IND, configfile: pytest.ini
plugins: xdist-3.1.0, anyio-3.6.2, pep8-1.0.6
collected 9 items

ogind/tests/test_demographics.py ....FFFF.                               [100%]

=================================== FAILURES ===================================
________________________________ test_get_fert _________________________________

    def test_get_fert():
        """
        Test of function to get fertility rates from data
        """
        S = 100
>       fert_rates = demographics.get_fert(S, 0, 100, graph=False)

ogind/tests/test_demographics.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

totpers = 100, start_year = 0, end_year = 100, graph = False

    def get_fert(totpers, start_year=2021, end_year=None, graph=False):
        """
        This function generates a vector of fertility rates by model period
        age that corresponds to the fertility rate data by age in years.

        Args:
            totpers (int): total number of agent life periods (E+S), >= 3
            start_year (int): first year data to download
            end_year (int or None): end year data to download
            graph (bool): =True if want graphical output

        Returns:
            fert_rates (Numpy array): fertility rates for each model period
                of life

        """
        if totpers > 100:
            err_msg = "ERROR get_fert(): totpers must be <= 100."
            raise ValueError(err_msg)

        # Get UN fertility rates for South Africa for ages 15-49
        ages_15_49 = np.arange(15, 50)
        fert_rates_15_49 = (
            get_un_fert_data(
                start_year=start_year, end_year=end_year, download=False
            )["fert_rate"]
            .to_numpy()
            .flatten()
        )

        # Extrapolate fertility rates for ages 1-14 and 50-100 using exponential
        # function
        ages_1_14 = np.arange(1, 15)
>       slope_15 = (fert_rates_15_49[1] - fert_rates_15_49[0]) / (
            ages_15_49[1] - ages_15_49[0]
        )
E       IndexError: index 1 is out of bounds for axis 0 with size 0

ogind/demographics.py:468: IndexError
________________________________ test_get_mort _________________________________

    def test_get_mort():
        """
        Test of function to get mortality rates from data
        """
        S = 100
>       mort_rates, infmort_rate = demographics.get_mort(S, 0, 100, graph=False)

ogind/tests/test_demographics.py:90:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

totpers = 100, start_year = 0, end_year = 100, graph = False

    def get_mort(totpers, start_year=2021, end_year=None, graph=False):
        """
        This function generates a vector of mortality rates by model period
        age. Source: UN Population Data portal.

        Args:
            totpers (int): total number of agent life periods (E+S), >= 3
            start_year (int): first year data to download
            end_year (int or None): end year data to download
            graph (bool): =True if want graphical output

        Returns:
            mort_rates (Numpy array) mortality rates that correspond to each
                period of life
            infmort_rate (scalar): infant mortality rate

        """
        if totpers > 100:
            err_msg = "ERROR get_mort(): totpers must be <= 100."
            raise ValueError(err_msg)

        # Get UN infant mortality and mortality rate data by age
        un_infmort_rate_df, mort_rates_df = get_un_mort_data(
            start_year=start_year, end_year=end_year, download=False
        )
>       un_infmort_rate = un_infmort_rate_df["infmort_rate"][
            un_infmort_rate_df["sex_num"] == 3
        ].to_numpy()[0]
E       IndexError: index 0 is out of bounds for axis 0 with size 0

ogind/demographics.py:570: IndexError
______________________________ test_get_mort_lt1 _______________________________

    def test_get_mort_lt1():
        """
        Test that mortality rates don't exceed 1
        """
        S = 100
>       mort_rates, infmort_rate = demographics.get_mort(S, 0, 100, graph=False)

ogind/tests/test_demographics.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

totpers = 100, start_year = 0, end_year = 100, graph = False

    def get_mort(totpers, start_year=2021, end_year=None, graph=False):
        """
        This function generates a vector of mortality rates by model period
        age. Source: UN Population Data portal.

        Args:
            totpers (int): total number of agent life periods (E+S), >= 3
            start_year (int): first year data to download
            end_year (int or None): end year data to download
            graph (bool): =True if want graphical output

        Returns:
            mort_rates (Numpy array) mortality rates that correspond to each
                period of life
            infmort_rate (scalar): infant mortality rate

        """
        if totpers > 100:
            err_msg = "ERROR get_mort(): totpers must be <= 100."
            raise ValueError(err_msg)

        # Get UN infant mortality and mortality rate data by age
        un_infmort_rate_df, mort_rates_df = get_un_mort_data(
            start_year=start_year, end_year=end_year, download=False
        )
>       un_infmort_rate = un_infmort_rate_df["infmort_rate"][
            un_infmort_rate_df["sex_num"] == 3
        ].to_numpy()[0]
E       IndexError: index 0 is out of bounds for axis 0 with size 0

ogind/demographics.py:570: IndexError
_______________________________ test_infant_mort _______________________________

    def test_infant_mort():
        """
        Test of function to get mortality rates from data
        """
>       mort_rates, infmort_rate = demographics.get_mort(100, 0, 100, graph=False)

ogind/tests/test_demographics.py:107:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

totpers = 100, start_year = 0, end_year = 100, graph = False

    def get_mort(totpers, start_year=2021, end_year=None, graph=False):
        """
        This function generates a vector of mortality rates by model period
        age. Source: UN Population Data portal.

        Args:
            totpers (int): total number of agent life periods (E+S), >= 3
            start_year (int): first year data to download
            end_year (int or None): end year data to download
            graph (bool): =True if want graphical output

        Returns:
            mort_rates (Numpy array) mortality rates that correspond to each
                period of life
            infmort_rate (scalar): infant mortality rate

        """
        if totpers > 100:
            err_msg = "ERROR get_mort(): totpers must be <= 100."
            raise ValueError(err_msg)

        # Get UN infant mortality and mortality rate data by age
        un_infmort_rate_df, mort_rates_df = get_un_mort_data(
            start_year=start_year, end_year=end_year, download=False
        )
>       un_infmort_rate = un_infmort_rate_df["infmort_rate"][
            un_infmort_rate_df["sex_num"] == 3
        ].to_numpy()[0]
E       IndexError: index 0 is out of bounds for axis 0 with size 0

ogind/demographics.py:570: IndexError

=========================== short test summary info ============================
FAILED ogind/tests/test_demographics.py::test_get_fert - IndexError: index 1 is out of bounds for axis 0 with size 0
FAILED ogind/tests/test_demographics.py::test_get_mort - IndexError: index 0 is out of bounds for axis 0 with size 0
FAILED ogind/tests/test_demographics.py::test_get_mort_lt1 - IndexError: index 0 is out of bounds for axis 0 with size 0
FAILED ogind/tests/test_demographics.py::test_infant_mort - IndexError: index 0 is out of bounds for axis 0 with size 0
================== 4 failed, 5 passed, 11 warnings in 21.96s ===================
rickecon commented 1 year ago

@jdebacker. I recommend that we put the tests folder outside of the ogind folder in OG-IND/tests/.

jdebacker commented 1 year ago

@rickecon I've got the demographics tests figured out - didn't notice your changes in args to the get_fert and get_mort functions.

I recommend that we put the tests folder outside of the ogind folder in OG-IND/tests/.

We did that in OG-Core because of the large size of files in OG-Core/tests/test_io_data and that making a package on PyPI or Conda very large and thus a burden to download. We've put the test files in ogXXX/tests in all the country repos (USA, UK, MYS) thus far because (1) there is not a large footprint in tests/test_io_data directories in these repos and (2) we haven't been distributing packages on Conda or PyPI. My proposal would be to make this repos' structure consistent with others for now and then we can change them all over at some point if needed in the future.

codecov-commenter commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@8ada864). Click here to learn what that means. Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #17 +/- ## ======================================= Coverage ? 66.21% ======================================= Files ? 9 Lines ? 879 Branches ? 0 ======================================= Hits ? 582 Misses ? 297 Partials ? 0 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `66.21% <0.00%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Revenue-Academy#carryforward-flags-in-the-pull-request-comment) to find out more. Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Revenue-Academy). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Revenue-Academy)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

jdebacker commented 1 year ago

@rickecon This PR is ready for review.

rickecon commented 1 year ago

@jdebacker. I pulled this branch and ran the run_og_ind.py script on my machine, and the transition path got to the end and didn't solve due to RuntimeError: Transition path equlibrium not found (RC_error). When I look at the resource constraint errors, the offending entries are in T=125, 126 and 127 with respective RC errors of the following size:

index |    error
------+---------------
125   | 1.10984501e-04
126   | 1.10329424e-04
127   | 1.03981945e-04

I don't have a good idea what this could be coming from. These are right in the middle of the 320-period transition path.

rickecon commented 1 year ago

@jdebacker. By the way, all the tests pass locally on my machine.

============================= test session starts ==============================
platform darwin -- Python 3.9.15, pytest-7.2.1, pluggy-1.0.0
rootdir: /Users/richardevans/Docs/Economics/OSE/OG-IND, configfile: pytest.ini
plugins: xdist-3.1.0, anyio-3.6.2, pep8-1.0.6
collected 22 items                                                             

ogind/tests/test_calibrate.py ....                                       [ 18%]
ogind/tests/test_demographics.py .........                               [ 59%]
ogind/tests/test_income.py ........                                      [ 95%]
ogind/tests/test_run_example.py .                                        [100%]
================= 22 passed, 13 warnings in 318.67s (0:05:18) ==================
jdebacker commented 1 year ago

@jdebacker. I pulled this branch and ran the run_og_ind.py script on my machine, and the transition path got to the end and didn't solve due to RuntimeError: Transition path equlibrium not found (RC_error). When I look at the resource constraint errors, the offending entries are in T=125, 126 and 127 with respective RC errors of the following size: ... I don't have a good idea what this could be coming from. These are right in the middle of the 320-period transition path.

Thanks for noting this. I'm not sure of cause, but T=120 is the period that there is a blip in the demographics to ensure convergence to a SS distribution. Perhaps the two sectors are interacting with that (e.g., a small numerical error in each sector plus the adjustment to imm_rates at T=120 pushes the RC error above the tolerance? I'll try to run this with constant_demographics=True to test that hypothesis.

jdebacker commented 1 year ago

@rickecon Update - the model solves with no resource constraint error when constant_demographics=True, suggesting this issue is related to the adjustment to immigration rates in demographics.py.