Closed BenTaylor-TfN closed 1 year ago
Had a look at the numba optimisations today and it's not good news! Code can be found here: https://github.com/Transport-for-the-North/caf.toolkit/tree/numba-slow-translation
Bottom line, numba isn't well optimised for broadcasting, so while the code is faster, it's still >1min.
Feel the best option here might be to trust pandas optimisation and use that when we fall back to the slow method. Why re-invent the wheel?
Feel the best option here might be to trust pandas optimisation and use that when we fall back to the slow method. Why re-invent the wheel?
Agreed. I think pandas is generally going to be optimised and tested more extensively than anything we'll come up with in a reasonable time frame.
pandas_multi_vector_zone_translation()
is doing the same thing as half of thepandas_matrix_zone_translation()
when it falls back to the slow method. This should be consolidated.Furthermore,
pandas_multi_vector_zone_translation()
is faster thanpandas_matrix_zone_translation()
when the latter needs to fall back to it's slow method. This should be optimised.The initial code
Run results
The above code results in the following.
It's worth noting a few things:
pandas_matrix_zone_translation()
at its worst.Finally, I've known this is an issue for a while and never got round to optimising the "slow" translation. I think this is the route we should now take.
Further tests
I think there's a few things we should investigate further here:
Furthermore, it's clear the "slow" method needs optimising. I see two options:
for
loop.The two options should all be tested with the above tests as well to make sure we end up with the optimal version of code.
I'll open a new Issue for this issue specifically.