Temporarily update Dockerfile to run python/comps.py

This PR is a companion to https://github.com/ccao-data/model-res-avm/pull/236, intended to benchmark the current performance of the comps algorithm using numba. I don't plan to merge it and instead will close it once benchmarking is complete.

Findings

CUDA doesn't seem to make much of a difference, and is counterproductive if anything. This makes me wonder whether the algorithm needs to be redesigned to make better use of the GPU, but I'm considering that question out of scope for now.
There are big performance gains to be had by simply bumping the instance type with the existing numba code. If the numbers below hold, we could speed up the comps code by 2x if we switched to c5.24xlarge instances, which are about twice as expensive as the m4.10xlarge instances we use now, so we'd probably break even on the change.
At small scales (20k observations/10k comparisons), taichi appears to outperform numba, but this improvement disappears if we scale up the size of the data. At a large scale (100k observations/50k comparisons), they perform about the same.

20k observations, 10k comparisons

framework	instance type	arch	time	logs
taichi	g5.12xlarge	x86	2.36s	link
taichi	g5.12xlarge	CUDA	4.33s	link
taichi	m4.10xlarge	x86	4.44s	link
numba	g5.12xlarge	x86	6.07s	link
numba	m4.10xlarge	x86	10.52s	link

100k observations, 50k comparisons

framework	instance type	arch	time	logs
numba	g5.12xlarge	x86	31.87s	link
taichi	c5.24xlarge	x86	31.93s	link
taichi	m4.10xlarge	x86	34.09s	link
numba	c5.24xlarge	x86	37.31s	link
taichi	g5.12xlarge	x86	37.75s	link
taichi	g5.12xlarge	CUDA	43.58s	link
numba	m4.10xlarge	x86	64.19s	link

ccao-data / model-res-avm

Temporarily update Dockerfile to run python/comps.py #237

Findings

20k observations, 10k comparisons

100k observations, 50k comparisons