We should have a testing script that our future users can use to compare performance with naive VLLM. For this end, we need to write two scripts, test_vllm_with_trace.py and test_lmcache_with_trace.py. test_lmcache_with_trace.py should also contain the setup for the lmcache setting (storage device, size used, model...)
Each file takes in a trace in the below json format: (Output length is the max-output length and if it EOS after the output length, keep inferencing until it meets the length.)
The idea is that given a request trace (ex. request trace received for a service), we will be able to compare performance of different versions of vllm and LMCache.
We should have a testing script that our future users can use to compare performance with naive VLLM. For this end, we need to write two scripts, test_vllm_with_trace.py and test_lmcache_with_trace.py. test_lmcache_with_trace.py should also contain the setup for the lmcache setting (storage device, size used, model...)
Each file takes in a trace in the below json format: (Output length is the max-output length and if it EOS after the output length, keep inferencing until it meets the length.)
and outputs the following format:
The idea is that given a request trace (ex. request trace received for a service), we will be able to compare performance of different versions of vllm and LMCache.