Open asongtoruin opened 1 month ago
On the exclusions - I think this is an issue with the way these two segments are defined. Economically inactive could be any age, but the only economically inactive segments defined in pop_emp are non-working_age, and students. As students exists in both they map onto each other so I excluded pop_emp: students from pop_econ: economically inactive. A solution would either be to alter the definition of pop_emp: 3 from 'unemployed' (which by definition means no job but economically active) to 'not working' for example, which would include the econmically inactive for reasons other than age or being a student.
There should be more validation of segment compatibility etc. yes. I don't have time to work on that right now as that's more than a hot fix, but I may be able to find some next week.
For now could you update segment definitions/exclusions as appropriate on your end and we'll review on our end in the future? If with sensible exclusions etc. this issue persists I'll have a closer look at what's going on.
If you have the code causing this issue in any kind of snippet it would be helpful if you could share it so I can have a look if I get a chance today.
@isaac-tfn I've added the relevant files into F:\Working\Land-Use\OUTPUTS\TEMP_NAN CHECKS
, along with an error_demo.py
that reproduces the two issues.
Thanks, any joy with fixing the segments/exclusions?
@asongtoruin, could you please confirm if this is providing issues? I understand the code was running with some size errors.
Background
As part of our Land-Use tests of the updated
caf.core
code, I keep seeingnan
values arise in the indices of ourDVector
s, particularly when we are multiplying them together.Sometimes this just results in a
UserWarning: This operation has dropped some rows due to exclusions in the resulting segmentation.
, i.e. we are triggering this part of_generic_dunder
:https://github.com/Transport-for-the-North/caf.core/blob/1f4a9344a8b4ef09a4d3f734ef514a22f25423ac/src/caf/core/data_structures.py#L725-L731
Which is maybe fine - the
nan
values are being dropped.The other issue we're having is a larger (and breaking) one - sometimes we get an
IndexError: The segmentation of the read in dvector data does not match the expected segmentation. This is likely due to unconsidered exclusions.
, i.e.validate_segmentation
is going awry:https://github.com/Transport-for-the-North/caf.core/blob/1f4a9344a8b4ef09a4d3f734ef514a22f25423ac/src/caf/core/segmentation.py#L307-L315
Crucially, this is happening during an
.expand_to_other
operation, as a result of a.add_segment
call and the associated validation of the new segmentation - i.e. it's occurring earlier in the process than thewarning
Understanding the problem
I think the issue is being caused by an incompatibility between the
Exclusion
s. We havepop_econ
3 listed as incompatible withpop_emp
1, 2, 3, & 4 - i.e. it's mapped solely topop_emp
5:https://github.com/Transport-for-the-North/caf.core/blob/1f4a9344a8b4ef09a4d3f734ef514a22f25423ac/src/caf/core/segments/pop_econ.yml#L9-L29
There no restrictions on
age_9
forpop_econ
3, howeverpop_emp
5 is only compatible withage_9
1, 2, 3, or 9:https://github.com/Transport-for-the-North/caf.core/blob/1f4a9344a8b4ef09a4d3f734ef514a22f25423ac/src/caf/core/segments/pop_emp.yml#L57-L84
So we can have e.g. data in one dataset with
pop_econ
3 andage_9
4 which is a combination not present in another dataset (containingpop_emp
).The thing I don't understand is why sometimes it only gives us a warning and sometimes fully errors out.
Possible solutions
@isaac-tfn any thoughts? I can pass across example files with each corresponding response if required.