This PR is a companion to https://github.com/ccao-data/model-res-avm/pull/236, intended to benchmark the current performance of the comps algorithm using numba. I don't plan to merge it and instead will close it once benchmarking is complete.
Findings
CUDA doesn't seem to make much of a difference, and is counterproductive if anything. This makes me wonder whether the algorithm needs to be redesigned to make better use of the GPU, but I'm considering that question out of scope for now.
There are big performance gains to be had by simply bumping the instance type with the existing numba code. If the numbers below hold, we could speed up the comps code by 2x if we switched to c5.24xlarge instances, which are about twice as expensive as the m4.10xlarge instances we use now, so we'd probably break even on the change.
At small scales (20k observations/10k comparisons), taichi appears to outperform numba, but this improvement disappears if we scale up the size of the data. At a large scale (100k observations/50k comparisons), they perform about the same.
This PR is a companion to https://github.com/ccao-data/model-res-avm/pull/236, intended to benchmark the current performance of the comps algorithm using numba. I don't plan to merge it and instead will close it once benchmarking is complete.
Findings
20k observations, 10k comparisons
100k observations, 50k comparisons