Open asongtoruin opened 1 month ago
Thanks for raising, yes this is something I expected to cause issues, I think the first and last scenarios are easy to handle, i.e. anywhere the target is zero, the result should be zero so yes, for those nans just infill with zero. Where agg is zero but target isn't is the issue, as that will prevent convergence. It depends how much we trust the seed DVector; if we trust it then this scenario is fine as those cells will always remain zero, and as long as there aren't too many zeros the convergence shouldn't take too big if a hit. If we don't trust those zeros in the seed we may want to infill zeros before the process even starts so those cells can be adjusted. Probably best just to have an 'infill zeros' parameter on the method I think?
I think for the time being it may be sufficient to just report if infinite values are cropping up, maybe including a report of where they're happening, and leave it to the used to handle seeding? I feel like something that blanket replaces zeroes could be too coarse.
OK for now I'm just adding a warning that there are inf factors, and the user can double check where the zeros are in the seed. I forgot to tag the issue but should be working now
We're having an issue with the .inf
check, a ValueError: The truth value of a Series is ambiguous
. Pretty sure it's here:
factor.data
is a DataFrame
, so .any
returns a Series
. This line should probably be:
if factor.data.eq(np.inf).any(axis=None):
(or .any().any()
if axis
is not available)
Sorry I had tested it on a series so it was returning one value, will fix now.
When the IPF factors are calculated, they're just done through a straightforward division:
https://github.com/Transport-for-the-North/caf.core/blob/90c9f53fd59cbd2f0a8a876fbcb56198cbcf02fb/src/caf/core/data_structures.py#L1405
However, if we have a 0 in:
target
but notagg
we get 0 (this is fine)agg
but nottarget
we getinf
NaN
I think this latter case is happening in our population calcs, as we're ending up with
NaN
values somewhere in the data post-IPF. Feels like something that should be fairly easy to handle, maybe even just with a.fillna(0)
- @isaac-tfn, any thoughts?