DemandRegioTeam / disaggregator

A set of tools for processing of spatial and temporal disaggregations.
GNU General Public License v3.0
29 stars 16 forks source link

Bug: Scaling of `data.households_per_size()` always returns 2011's value #16

Open nesnoj opened 1 year ago

nesnoj commented 1 year ago

Problem description

Scaling of data.households_per_size() doesn't work - it returns the 2011's value for any passed year.

Expected behavior

data.households_per_size(year=<year>) returns households for year.

Investigation

I think this bug was introduced in d2a807a3a2f927d206eec4e9b4d4c205940658ad.

In these lines https://github.com/DemandRegioTeam/disaggregator/blob/07ce557bc99f296626b1759dd86451acc9a5975d/disaggregator/data.py#L951-L955 the year is set to 2011 which is necessary in order to obtain the table from the database.

However, due to this setting the scaling will now always use 2011: https://github.com/DemandRegioTeam/disaggregator/blob/07ce557bc99f296626b1759dd86451acc9a5975d/disaggregator/data.py#L972-L985

Hence, when executing {y: data.households_per_size(year=y, scale_by_pop=True).sum().sum() for y in [2010, 2020, 2030]} the return is {2010: 37552728, 2020: 37552728, 2030: 37552728} which is the 2011's value. Consequently, the implemented warning is raised: "WARNING Scaling the household numbers by the population should only be used if table_id=14 is used and the year is not 2011."

Doublecheck: When using the former version

    elif source == 'database':
        df = database_get('spatial', table_id=table_id, year=2011,
                          force_update=force_update)

I get the correct-looking scaled values {2010: 38230136, 2020: 39018719, 2030: 39004027}.

Question: My state of knowledge does not allow to completely estimate the consequences of (re-)deleting the table_id here. I did not find any call if households_per_size() with provided table_id. Ok, let's say you wanna bypass the id from config, then it can make sense but in this case the condition wouldn't be met anyway and seems useless :thinking: .

Consequences

Other functions which use this household function such as spatial.disagg_households_power() are affected too: {y: spatial.disagg_households_power(by='households', weight_by_income=False, force_update=True, year=y, scale_by_pop=True).sum().sum() for y in range(2010, 2060, 10)} returns

{2010: 123765.970543, 2020: 123765.970543, 2030: 123765.970543, 2040: 123765.970543, 2050: 123765.970543}

With the former version the result is {2010: 125975.97663199999, 2020: 128519.320004, 2030: 128528.948116, 2040: 126649.01529899999, 2050: 123782.80167100001}

PS: Maybe related to #12

nesnoj commented 1 year ago

One more thing which is needed to make disagg_households_power() work : it seems that the parameter original has been succeeded by scale_by_pop (e.g. in d2a807a), but not everywhere. Example:

https://github.com/DemandRegioTeam/disaggregator/blob/07ce557bc99f296626b1759dd86451acc9a5975d/disaggregator/spatial.py#L109-L137

In L135 the old param name is passed which leads to wrong results.