Open nesnoj opened 1 year ago
One more thing which is needed to make disagg_households_power()
work : it seems that the parameter original
has been succeeded by scale_by_pop
(e.g. in d2a807a), but not everywhere. Example:
In L135 the old param name is passed which leads to wrong results.
Problem description
Scaling of
data.households_per_size()
doesn't work - it returns the 2011's value for any passed year.Expected behavior
data.households_per_size(year=<year>)
returns households for year.Investigation
I think this bug was introduced in d2a807a3a2f927d206eec4e9b4d4c205940658ad.
In these lines https://github.com/DemandRegioTeam/disaggregator/blob/07ce557bc99f296626b1759dd86451acc9a5975d/disaggregator/data.py#L951-L955 the year is set to 2011 which is necessary in order to obtain the table from the database.
However, due to this setting the scaling will now always use 2011: https://github.com/DemandRegioTeam/disaggregator/blob/07ce557bc99f296626b1759dd86451acc9a5975d/disaggregator/data.py#L972-L985
Hence, when executing
{y: data.households_per_size(year=y, scale_by_pop=True).sum().sum() for y in [2010, 2020, 2030]}
the return is{2010: 37552728, 2020: 37552728, 2030: 37552728}
which is the 2011's value. Consequently, the implemented warning is raised: "WARNING Scaling the household numbers by the population should only be used if table_id=14 is used and the year is not 2011."Doublecheck: When using the former version
I get the correct-looking scaled values
{2010: 38230136, 2020: 39018719, 2030: 39004027}
.Question: My state of knowledge does not allow to completely estimate the consequences of (re-)deleting the
table_id
here. I did not find any call ifhouseholds_per_size()
with providedtable_id
. Ok, let's say you wanna bypass the id from config, then it can make sense but in this case the condition wouldn't be met anyway and seems useless :thinking: .Consequences
Other functions which use this household function such as
spatial.disagg_households_power()
are affected too:{y: spatial.disagg_households_power(by='households', weight_by_income=False, force_update=True, year=y, scale_by_pop=True).sum().sum() for y in range(2010, 2060, 10)}
returns{2010: 123765.970543, 2020: 123765.970543, 2030: 123765.970543, 2040: 123765.970543, 2050: 123765.970543}
With the former version the result is
{2010: 125975.97663199999, 2020: 128519.320004, 2030: 128528.948116, 2040: 126649.01529899999, 2050: 123782.80167100001}
PS: Maybe related to #12