LMCache / lmcache-tests

3 stars 4 forks source link

End-to-end test for LMCache

Note: currently, this doc is for onboarding the new developers. Will have a separate README in the future for general audiences.

It's recommended to create a new folder before cloning the repository. The final file structure will look like as follows:

<parent-folder>/
|--- lmcache-test/
|--- LMCache/
|--- lmcache-vllm/

1. Environment installation

# Create conda environment
conda create -n lmcache python=3.10
conda activate lmcache

# Clone github repository
git clone git@github.com:LMCache/lmcache-tests.git
cd lmcache-tests

# Run the installation script
bash prepare_environment.sh

2. Run the tests

2.1 Quickstart example

The following command line runs the test test_lmcache_local_cpu defined in tests/tests.py and write the output results to the output folder (outputs/test_lmcache_local_cpu.csv).

python3 main.py tests/tests.py -f test_lmcache_local_cpu -o outputs/

To process the result, please run

cd outputs/
python3 process_result.py

Then, a pdf file test_lmcache_local_cpu.pdf will be created.

You can also monitor the following files to check the status of the bootstrapped vllm process.

For stderr:

tail -f /tmp/8000-stderr.log

For stdout:

tail -f /tmp/8000-stdout.log

2.2 Usage of main.py

main.py is the entrypoint to execute the test functions:

usage: main.py [-h] [-f FILTER] [-l] [-o OUTPUT_DIR] [-m MODEL] filepath

Execute all functions in a given Python file.

positional arguments:
  filepath              The Python file to execute functions from (include subfolders if any).

options:
  -h, --help            show this help message and exit
  -f FILTER, --filter FILTER
                        Pattern to filter which functions to execute.
  -l, --list            List all functions in the module without executing them.
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        The directory to put the output file.
  -m MODEL, --model MODEL           
                        The models of vllm for every functions.

Here are some examples:

# Run all the test functions defined in 'tests/tests.py' and save the output to 'outputs/'
python3 main.py tests/tests.py -o outputs/

# List the tests in 'tests/tests.py'
python3 main.py tests/tests.py -l

# Run some specific tests that match the given pattern (e.g., containing 'cachegen')
python3 main.py tests/tests.py -f cachegen

# Run all the test functions defined in 'tests/tests.py' with llama
python3 main.py tests/tests.py -m "meta-llama/Llama-3.1-8B-Instruct"

2.3 Output parsing

In general, each test function should output the results as a csv file, where the file name is the same as function name but with a csv suffix. There should be multiple columns in the CSV:

Some example codes of how to parse the output CSV can be found in outputs/process_result.py.

3. Contributing guide: understanding the code

3.1 Basic terminology

3.2 Main components

Test case configuration:

Test case configuration controls the experiments to run. The configuration-related code can be found in config.py.

Currently, we support the following configurations:

During the experiment, workload configuration will be used to generate the workloads, and vLLM configuration + LMCache configuration will be used to start the engine.

Workload generator:

The workload generator takes in a workload configuration and generates the workload (i.e., a list of requests at different timestamps) as the output. The code for the workload generator can be found in workload.py.

By design, there could be multiple different kinds of workload generators for different use cases, such as chatbot, QA, or RAG. The class Usecase is used to specify which workload generator to create during runtime. Currently, we only support a DUMMY use case where the requests in the generated workload only contain dummy texts and questions.

The workload generator, once initialized with a configuration, only provides a single method: generate(self) -> Workload.

Engine bootstrapper:

The engine bootstrapper pulls up the serving engine based on the configurations (vLLM configuration + LMCache configuration). Currently, we only support starting vLLM (with or without LMCache) from the terminal. We will support docker-based engines in the future. The code can be found in bootstrapper.py

The engine bootstrapper supports the following methods:

Experiment runner:

The experiment runner takes in one workload config and $N$ engine configs as input. It does the following things:

The code can be found in driver.py

4. Contributing guide: adding new tests

(WIP)