Open asongtoruin opened 21 hours ago
The current logic behind it is:
Looking at the method now, I think actually the flaw in it is that it should be
mse += diff.sum() / len(target
rather than
mse += diff.sum() / len(target.data)
As len(target.data) returns the number of rows and we want it divided by the total number of cells.
Sorry ignore that previous comment, target is an IpfTarget, so target.data is a DVector and len(target.data) is the correct thing. I think the current calculation is ok but there could be something I'm missing. I think the way you've changed it will warp depending on the length of targets, so essentially short targets will contribute less than longer ones
Thanks Isaac - I don't know the ins and outs of the IPF, so you will have to advise if I'm misinterpreting anything here, but my view is that what is currently being calculated is not actual RMSE. By doing mse += diff.sum() / len(target.data)
, and then returning mse**0.5
, we are effectively returning the square root of the sum of mean squared errors which is not the same as the square root of mean squared errors.
MSE is this
and I think we are calculating this?
What you have implemented might be right, and it might just be because it's named rmse
that its a tad misleading. I'm just running a quick test to check the performance of the model against targets when using Adam's suggested code to see if there are any significant differences and will update this later
Confirmed that the suggested fix has no material impact on 2023 population outputs:
so mainly appears to be a runtime benefit if correct. We can look into this later @asongtoruin
Looking at the
ipf_adjustment_fixes
branch in particular again - the RMSE calculation is effectively weighted per-target:https://github.com/Transport-for-the-North/caf.base/blob/0e499aeb07206eb03e70bbc982297979ee4e4686/src/caf/base/data_structures.py#L1510
I think the result of this is that targets with fewer rows (i.e. fewer segment category combinations) are given a higher weighting in the calculation. I'm not sure if this is deliberate, but it feels a bit weird mathematically.
An alternative option would be to keep track of the mean-squared error, and then divide by the total length - i.e. every cell comparison for every target would be given equal weighting:
From a quick test, this results in a lower RMSE value, which could lead to quicker convergence. Any thoughts @isaac-tfn?