I've left in the original functions, appending _prior to the name for testing. I've tested various inputs w/ the new implementations and the outputs match exactly.
Getting rid of the for loops results in a considerable speedup:
From testing using %timeit in ipython:
get_centroids:
before: 266 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
after: 4.5 µs ± 72.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
get_cossim:
before: 20.3 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
after: 398 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
calc_loss:
before: 1.12 ms ± 9.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
after: 66.8 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I've left in the original functions, appending _prior to the name for testing. I've tested various inputs w/ the new implementations and the outputs match exactly.
Getting rid of the for loops results in a considerable speedup:
From testing using %timeit in ipython:
get_centroids: before: 266 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) after: 4.5 µs ± 72.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
get_cossim: before: 20.3 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) after: 398 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
calc_loss: before: 1.12 ms ± 9.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) after: 66.8 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)