Hello, I am a little bit confused about efficient inference and kernel implementation for this paper.
Let's sat we use residual quantization $K$ times for some columns or rows. It means we need to multiply these columns with an input vector several times. It affects our latency. Any thoughts on how to improve that.
Hello, I am a little bit confused about efficient inference and kernel implementation for this paper.
Let's sat we use residual quantization $K$ times for some columns or rows. It means we need to multiply these columns with an input vector several times. It affects our latency. Any thoughts on how to improve that.