(Testing) End-to-End Testing Script

We should have a testing script that our future users can use to compare performance with naive VLLM. For this end, we need to write two scripts, test_vllm_with_trace.py and test_lmcache_with_trace.py. test_lmcache_with_trace.py should also contain the setup for the lmcache setting (storage device, size used, model...)

Each file takes in a trace in the below json format: (Output length is the max-output length and if it EOS after the output length, keep inferencing until it meets the length.)

[
  {
    "request_id": "1",
    "scheduled_request_time(ms)": "1000", 
    "input_tokens": ['1323', '31232', '42124', '3123'...],
    "expected_output_length": 200
  },
  {
    "request_id": "2",
    "scheduled_request_time": "1100",
    "input_tokens": ['123', '3232', '4214', '323'...],
    "expected_output_length": 150
  }
]

and outputs the following format:

[
  {
    "request_id": "1",
    "scheduled_request_time (ms)": "1000",
    "actual_start_time": "1000",
    "TTFT": "100",
    "TBT": ['1', '3', '3'.....], 
    "finish_timestamp": "1120" ,
  },
  {
    "request_id": "2",
    "scheduled_request_time (ms)": "1100",
    "actual_start_time": "1120",
    "TTFT": "150",
    "TBT": ['1', '3', '3'.....], 
    "finish_timestamp": "1240",
  }
]

The idea is that given a request trace (ex. request trace received for a service), we will be able to compare performance of different versions of vllm and LMCache.

LMCache / lmcache-tests

(Testing) End-to-End Testing Script #9