Transport-for-the-North / caf.core

Core classes and definitions for CAF family of tools
GNU General Public License v3.0
0 stars 1 forks source link

DVector combination *without* overlapping segments #25

Open asongtoruin opened 4 months ago

asongtoruin commented 4 months ago

At present, DVector._generic_dunder calls DVector.overlap:

https://github.com/Transport-for-the-North/caf.core/blob/a644f6f719232e9ac2d0eefb289f591153b3385b/src/caf/core/data_structures.py#L533-L544

Which in turn raises an error if there's no overlap in segmentation between the two being combined

https://github.com/Transport-for-the-North/caf.core/blob/a644f6f719232e9ac2d0eefb289f591153b3385b/src/caf/core/data_structures.py#L523-L531

In Land-Use, we would ideally like to be able to combine DVectors that don't have overlapping segments - for example, we could have population at LSOA level and gender split at LSOA level, and want to apply the splits to the population. I can think of a few potential approaches to this:

  1. A method to "increase" segmentation after creation of the DVector object, i.e. .add_segmentation, likely with some "extrapolation method" options (e.g. evenly splitting the existing value across the new segmentation, retaining the existing value)
  2. A "constant DVector" creation (class)method. Similar trace to __init__, with the user providing Segmentation and ZoningSystem information, alongside a constant value to infill with. If we create a "more disaggregated" DVector with a constant value of 0, adding this to our "less disaggregated" DVector I think would have the same effect as duplicating the less-disaggregated values across each of the new segments
  3. We handle this separately in Land-Use by pre-processing the data to artificially represent the segmentation required even if it's not actually found in the data. I'd rather not do this!

Do you have any thoughts about this? Or any potential alternative approaches within the existing setup?

isaac-tfn commented 3 months ago

Didn't notice these issues had been added. A splitting method would be a good idea, with the option to either replicate values at the resultant segmentation (so all values are essentially multiplied by 1), or to split them evenly into the new segmentation. Anything more complicated (splitting by some kind of weighting) could be done with existing dunder methods but simply multiplying by a second DVector including the desired factors. It's probably too late but I'll look at adding something in tomorrow.