HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
36 stars 41 forks source link

[Bug]: 10% perf drop on mixtral8x22b due to commit b62fba85ac03326e9f466d8d37e91ae1b14a6511 #305

Open hlin99 opened 1 week ago

hlin99 commented 1 week ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

seq_group_metadata_list.extend( self.create_dummy_seq_group_metadata(0, 0, isprompt) for in range(batch_size_padding))

this piece of code introduces metadata certation in loop, and observe 10% perf drop. is this code change intentional?

iboiko-habana commented 1 week ago

Please re-check perf with https://github.com/HabanaAI/vllm-fork/pull/301

hlin99 commented 1 week ago

Unfortunately, performance has not improved, and the data looks identical before and after applying the patch. It seems that dummy creation and list extension are not the root cause of the performance drop. Instead, the issue appears to stem from changes to the dummy metadata, which are affecting subsequent calling path changes.

iboiko-habana commented 1 week ago

Please share traces or steps for reproduction

hlin99 commented 1 week ago
  1. Below is my docker configuration with VLLM environments setup.
  2. Then, in the docker environment, goto vllm/benchmark
  3. run benchmark cmd python benchmark_throughput.py --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --tensor-parallel-size 8 --model mistralai/Mixtral-8x22B-Instruct-v0.1 --device hpu --dtype bfloat16 --gpu-memory-utilization 0.7 --max-num-batched-tokens 262144

before the change, the output throughput is about 2500 tokens/s, after the change it becomes 2200 tokens/s.


!/bin/bash

export DOCKER_IMAGE=${DOCKER_IMAGES:-vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest} export CONTAINER_NAME=${CONTAINER_NAME:-vllm-server-mixtral-8x22b} export DATA_DIR=${DATA_DIR:-/data0} export SSH_PORT=${SSH_PORT:-3022} export HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES:-all} export HF_TOKEN=${HF_TOKEN}

print_help(){ echo "Usage: $0 [options]" echo "This script create and setup the docker container or $CONTAINER_NAME" echo "Enter the container bash shell if no option specified." echo echo "Options:" echo " -h, --help Show this help message and exit." echo " 1 Create and setup the base container and exit." echo " 2 Setup the container based on the setup.sh and exit" echo " 0 Stop the container" echo " -1 Stop and remove the container" }

if [[ "$1" == "-h" || "$1" == "--help" ]]; then print_help exit 0 fi

if [ ! "${HABANA_VISIBLE_DEVICES}" == "all" ]; then index_module_data=$(hl-smi --query-aip=index,module_id --format=csv) echo "$index_module_data" declare -A index_module_map while IFS=", " read -r index module_id; do index_module_map[$index]=$module_id done <<< "$(echo "$index_module_data" | tail -n +2)" indices=(${HABANA_VISIBLE_DEVICES//,/ }) module_ids=() for index in "${indices[@]}"; do module_ids+=(${index_module_map[$index]}) done visible_modules=$( IFS=,; echo "${module_ids[*]}") echo HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES} echo HABANA_VISIBLE_MODULES=${visible_modules} else visible_modules="0,1,2,3,4,5,6,7"
fi

container_existing=$(docker ps -a --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}') container_running=$(docker ps --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}')

if [[ "$1" == "1" ]] || [[ -z "$container_existing" ]]; then if [ ! -z "$container_existing" ]; then echo "Error: Container ${CONTAINER_NAME} exists. Remove the existing container first." exit -1 fi
docker run --runtime=habana --name ${CONTAINER_NAME} -td \ -e HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES} \ -e OMPI_MCA_btl_vader_single_copy_mechanism=none \ --cap-add=sys_nice --net=host --ipc=host \ --env http_proxy=${http_proxy} \ --env https_proxy=${https_proxy} \ --env no_proxy=${no_proxy} \ --env HF_HOME=${DATA_DIR}/huggingface \ --env DATA_DIR=${DATA_DIR} \ --env WORKSPACE_ROOT=/workspace \ --env HABANA_VISIBLE_MODULES=${visible_modules} \ --env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \ --env PT_HPU_ENABLE_LAZY_COLLECTIVES=true \ --env PT_HPUGRAPH_DISABLE_TENSOR_CACHE=1 \ --env VLLM_GRAPH_RESERVED_MEM=0.6 \ --env VLLM_GRAPH_PROMPT_RATIO=0 \ --env VLLM_DECODE_BLOCK_BUCKET_MAX=2048 \ --env VLLM_PROMPT_BS_BUCKET_STEP=128 \ --env VLLM_PROMPT_BS_BUCKET_MAX=256 \ --volume pwd:/workspace \ --volume ${DATA_DIR}:${DATA_DIR} \ --name ${CONTAINER_NAME} \ ${DOCKER_IMAGE} bash