Optimising the slow matrix translation methods

BenTaylor-TfN commented 1 year ago

pandas_multi_vector_zone_translation() is doing the same thing as half of the pandas_matrix_zone_translation() when it falls back to the slow method. This should be consolidated.

Furthermore, pandas_multi_vector_zone_translation() is faster than pandas_matrix_zone_translation() when the latter needs to fall back to it's slow method. This should be optimised.

The initial code

    import pandas as pd
    from caf.toolkit import timing
    from caf.toolkit import translation

    # Read in data
    lookup = pd.read_csv(r"I:\Data\Zone Translations\cache\noham_ntem\noham_to_ntem_pop_2018.csv")
    mat = pd.read_csv(r"I:\NorMITs Demand\Forecasting\NTEM\8.0\iter3.1\Core\car_and_passenger\Matrices\OD\hb_od_from_yr2028_p1_m3_tp1.csv.bz2", index_col=0)

    # pandas_matrix_zone_translation() method
    start = timing.current_milli_time()
    trans = translation.pandas_matrix_zone_translation(
        matrix=mat,
        translation=lookup,
        translation_from_col='noham_id',
        translation_to_col='ntem_id',
        translation_factors_col='noham_to_ntem',
        from_unique_index=lookup['noham_id'].unique(),
        to_unique_index=lookup['ntem_id'].unique()
    )
    end = timing.current_milli_time()
    print(f"matrix took: {end - start}ms")

    # pandas_multi_vector_zone_translation() method
    start = timing.current_milli_time()
    half_done = translation.pandas_multi_vector_zone_translation(
        vector=mat,
        translation=lookup,
        translation_from_col='noham_id',
        translation_to_col='ntem_id',
        translation_factors_col='noham_to_ntem',
        from_unique_index=lookup['noham_id'].unique(),
        to_unique_index=lookup['ntem_id'].unique(),
        check_totals=False,
    )
    done = translation.pandas_multi_vector_zone_translation(
        vector=half_done.transpose(),
        translation=lookup,
        translation_from_col='noham_id',
        translation_to_col='ntem_id',
        translation_factors_col='noham_to_ntem',
        from_unique_index=lookup['noham_id'].unique(),
        to_unique_index=lookup['ntem_id'].unique(),
        check_totals=False,
    ).transpose()
    end = timing.current_milli_time()
    print(f"multi-vector took: {end - start}ms")

Run results

The above code results in the following.

The matrix zone translation took ~2/3 mins as expected, using about 10Gb of memory in total.
The vector method took ~2/3 seconds, again using about 10Gb of memory in total.

It's worth noting a few things:

The pandas matrix zone translation falls back to a slow method if it runs out of memory - that's happening here
The slow method does scale with more cores
That said, this is a particularly bad example for pandas matrix translation. The data is just too big to instantly translate. This accurately captures pandas_matrix_zone_translation() at its worst.

Finally, I've known this is an issue for a while and never got round to optimising the "slow" translation. I think this is the route we should now take.

Further tests

I think there's a few things we should investigate further here:

How does this exact same test perform when run on a system with more memory and cores?
How does this test run when we use smaller zoning systems in the translation, such as NoHAM to sector?

Furthermore, it's clear the "slow" method needs optimising. I see two options:

Just replace it with the multi-vector function. Although, this is written in pandas so it will need updating for numpy data.
Investigate numba / cython optimisations for the current code. It's slow because it's a for loop.

The two options should all be tested with the above tests as well to make sure we end up with the optimal version of code.

I'll open a new Issue for this issue specifically.

BenTaylor-TfN commented 1 year ago

Had a look at the numba optimisations today and it's not good news! Code can be found here: https://github.com/Transport-for-the-North/caf.toolkit/tree/numba-slow-translation

Bottom line, numba isn't well optimised for broadcasting, so while the code is faster, it's still >1min.

Feel the best option here might be to trust pandas optimisation and use that when we fall back to the slow method. Why re-invent the wheel?

isaac-tfn commented 1 year ago

Feel the best option here might be to trust pandas optimisation and use that when we fall back to the slow method. Why re-invent the wheel?

Agreed. I think pandas is generally going to be optimised and tested more extensively than anything we'll come up with in a reasonable time frame.

Transport-for-the-North / caf.toolkit