This PR reduces the cost of encoding and decoding RawKV and improves throughput.
Row size
Master
This PR
4KB
40MB/s
110+ MB/s, lag 2-5s, CPU 400%
100KB
40MB/s
266 MB/s, lag 2-5s, CPU 550%
Note that the results in the rightmost column do not represent the program's performance limit, but are constrained by the amount of traffic we can write upstream.
This PR reduces the cost of encoding and decoding RawKV and improves throughput.
Note that the results in the rightmost column do not represent the program's performance limit, but are constrained by the amount of traffic we can write upstream.