cipher982 / llm-benchmarks

Benchmarking LLM Inference Speeds
MIT License
11 stars 0 forks source link

Run benchmarks locally #7

Open daniil-lyakhov opened 2 months ago

daniil-lyakhov commented 2 months ago

Greetings, @cipher982!

I've seen the benchmark application https://www.llm-benchmarks.com/local and it looks great! I'm currently working on a competitive analysis of this 4 backends: Transformers, TGI, VLLM and llamma.cpp and this data is exactly what I need! I'm curios: how to run this benchmarks locally on my hardware, a step by step manual or tutorial would help a lot!

Thanks

cipher982 commented 2 months ago

Ya it's a bit of a mess as it has grown over time from just local benchmarks to using cloud providers, then adding in a frontend. But it should be as simple as starting up the relevant docker containers.

docker compose -f docker-compose.local.yml up --build

And then you will see the various containers booting up their flask interfaces.

llm-benchmarks-bench_transformers-1  |  * Serving Flask app 'server'
llm-benchmarks-bench_transformers-1  |  * Debug mode: off
llm-benchmarks-bench_gguf-1          |  * Serving Flask app 'server'
llm-benchmarks-bench_gguf-1          |  * Debug mode: off
llm-benchmarks-bench_vllm-1          |  * Serving Flask app 'llm_bench_vllm.server'
llm-benchmarks-bench_vllm-1          |  * Debug mode: off

For passing in the requests you use llm_bench_api that gets installed through the relevant pyproject.toml fils with this line llm-bench-api = { path = "../api" } that will install the modules from api/llm_bench_api/. The file api.py serves as the primary entry point between the flask servers and the scripts that are used to initiate a benchmark run.

So once all the docker containers are running you can call one of the relevant scripts in scripts/ that use the CLI args to control what is run.

python scripts/run_hf.py --framework transformers --limit 10 --max-size-billion 20 --fetch-new-models

I am primary logging everything to my MongoDB collections, but you can likely extract some metrics from the log file as well. I can try and start fresh and follow my steps to see if any issues come up, but feel free to just try these right away and see how far you can get.

cipher982 commented 2 months ago

Anything based on HuggingFace models can be run pretty easily with auto-downloading from the Hub. When this was put together GGUF was in early stages and I had to manually download and convert them with gguf/create_models.sh, but there should be more straightforward methods now, such as downloading from the HuggingFace Hub (I noticed they have GGUF files now).

cipher982 commented 2 months ago

One thing that will need to be modified is the compose configs such as docker-compose.local.yml as I have some hardcoded paths specific to my system for storing files. Depending on if you want to use more than one GPU you should also be able to configure different containers to use different GPU IDs. I think it's currently set as CUDA_VISIBLE_DEVICES=1 to use my second GPU. But you could just change to CUDA_VISIBLE_DEVICES=0 (for single GPU system) or CUDA_VISIBLE_DEVICES=0,1 if you want to spread models out across a couple of them.

cipher982 commented 2 months ago

A challenge at first was handling different dependency requirements as some of these libraries are moving fast at the moment, so I created separate toml and docker files for each to keep their code siloed, and just interact using Flask requests. I am looking at restructuring the repo to be more obvious from the start where everything exists.

cipher982 commented 2 months ago

Also just remembered that TGI is closely related to just using standard HuggingFace modules in Python, so that is nested within the huggingface directory and docker container.

cipher982 commented 2 months ago

Will be focusing on restructuring, cleaning up, and generally improving the code with this https://github.com/cipher982/llm-benchmarks/pull/9

daniil-lyakhov commented 2 months ago

@cipher982, appreciate your detailed answer, I'll try to run the benchmark locally UPD:

docker compose -f docker-compose.local.yml up --build

WARN[0000] /home/dlyakhov/Projects/llm-benchmarks/docker-compose.local.yml: `version` is obsolete 
env file /home/dlyakhov/Projects/llm-benchmarks/.env not found: stat /home/dlyakhov/Projects/llm-benchmarks/.env: no such file or directory

Looks like I need to build an environment for the docker

cipher982 commented 2 months ago

Yes I am going through each variable for now figuring out which ones are needed for local runs (I originally combined cloud and local environments. This is what I have so far

HF_TOKEN=""
HUGGINGFACE_HUB_CACHE=""
MONGODB_URI=""
MONGODB_DB=""
MONGODB_COLLECTION_LOCAL=""
GPU_DEVICE="0"
daniil-lyakhov commented 2 months ago

Ref: .env files for docker: https://docs.docker.com/compose/environment-variables/env-file/