Transport-for-the-North / caf.base

Core classes and definitions for CAF family of tools
GNU General Public License v3.0
0 stars 1 forks source link

Possible alternative RMSE calculation #72

Open asongtoruin opened 21 hours ago

asongtoruin commented 21 hours ago

Looking at the ipf_adjustment_fixes branch in particular again - the RMSE calculation is effectively weighted per-target:

https://github.com/Transport-for-the-North/caf.base/blob/0e499aeb07206eb03e70bbc982297979ee4e4686/src/caf/base/data_structures.py#L1510

I think the result of this is that targets with fewer rows (i.e. fewer segment category combinations) are given a higher weighting in the calculation. I'm not sure if this is deliberate, but it feels a bit weird mathematically.

An alternative option would be to keep track of the mean-squared error, and then divide by the total length - i.e. every cell comparison for every target would be given equal weighting:

    def calc_rmse(self, targets: list[IpfTarget]) -> float:
        """
        Calculate the rmse relative to a set of targets.

        Parameters
        ----------
        targets: list[IpfTarget]
            The targets to calc rmse relative to.
        """
        squared_error = 0
        total_length = 0
        for target in targets:
            check = self.copy()
            if target.zone_translation is not None:
                check = self.translate_zoning(
                    target.data.zoning_system,
                    trans_vector=target.zone_translation,
                    _bypass_validation=True,
                )
            if target.segment_translations is not None:
                for seg in target.data.segmentation - self.segmentation:
                    seg = seg.name
                    if target.segment_translations[seg] in self.segmentation.names:
                        lower_seg = self.segmentation.get_segment(
                            target.segment_translations[seg]
                        )
                        check = check.translate_segment(
                            from_seg=lower_seg.name, to_seg=seg, _bypass_validation=True
                        )
                    else:
                        raise ValueError("No translation defined for this segment.")
            diff = (
                check.aggregate(target.data.segmentation, _bypass_validation=True).__sub__(
                    target.data, _bypass_validation=True
                )
            ) ** 2
            squared_error += diff.sum()
            total_length += len(target.data)
        return (squared_error / total_length)**0.5

From a quick test, this results in a lower RMSE value, which could lead to quicker convergence. Any thoughts @isaac-tfn?

isaac-tfn commented 5 hours ago

The current logic behind it is:

Looking at the method now, I think actually the flaw in it is that it should be

mse += diff.sum() / len(target rather than mse += diff.sum() / len(target.data)

As len(target.data) returns the number of rows and we want it divided by the total number of cells.

isaac-tfn commented 5 hours ago

Sorry ignore that previous comment, target is an IpfTarget, so target.data is a DVector and len(target.data) is the correct thing. I think the current calculation is ok but there could be something I'm missing. I think the way you've changed it will warp depending on the length of targets, so essentially short targets will contribute less than longer ones

suziebod commented 5 hours ago

Thanks Isaac - I don't know the ins and outs of the IPF, so you will have to advise if I'm misinterpreting anything here, but my view is that what is currently being calculated is not actual RMSE. By doing mse += diff.sum() / len(target.data), and then returning mse**0.5, we are effectively returning the square root of the sum of mean squared errors which is not the same as the square root of mean squared errors.

MSE is this image

and I think we are calculating this? image

What you have implemented might be right, and it might just be because it's named rmse that its a tad misleading. I'm just running a quick test to check the performance of the model against targets when using Adam's suggested code to see if there are any significant differences and will update this later

suziebod commented 35 minutes ago

Confirmed that the suggested fix has no material impact on 2023 population outputs:

image image

so mainly appears to be a runtime benefit if correct. We can look into this later @asongtoruin