Open hlin99 opened 1 week ago
Please re-check perf with https://github.com/HabanaAI/vllm-fork/pull/301
Unfortunately, performance has not improved, and the data looks identical before and after applying the patch. It seems that dummy creation and list extension are not the root cause of the performance drop. Instead, the issue appears to stem from changes to the dummy metadata, which are affecting subsequent calling path changes.
Please share traces or steps for reproduction
before the change, the output throughput is about 2500 tokens/s, after the change it becomes 2200 tokens/s.
export DOCKER_IMAGE=${DOCKER_IMAGES:-vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest} export CONTAINER_NAME=${CONTAINER_NAME:-vllm-server-mixtral-8x22b} export DATA_DIR=${DATA_DIR:-/data0} export SSH_PORT=${SSH_PORT:-3022} export HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES:-all} export HF_TOKEN=${HF_TOKEN}
print_help(){ echo "Usage: $0 [options]" echo "This script create and setup the docker container or $CONTAINER_NAME" echo "Enter the container bash shell if no option specified." echo echo "Options:" echo " -h, --help Show this help message and exit." echo " 1 Create and setup the base container and exit." echo " 2 Setup the container based on the setup.sh and exit" echo " 0 Stop the container" echo " -1 Stop and remove the container" }
if [[ "$1" == "-h" || "$1" == "--help" ]]; then print_help exit 0 fi
if [ ! "${HABANA_VISIBLE_DEVICES}" == "all" ]; then
index_module_data=$(hl-smi --query-aip=index,module_id --format=csv)
echo "$index_module_data"
declare -A index_module_map
while IFS=", " read -r index module_id; do
index_module_map[$index]=$module_id
done <<< "$(echo "$index_module_data" | tail -n +2)"
indices=(${HABANA_VISIBLE_DEVICES//,/ })
module_ids=()
for index in "${indices[@]}"; do
module_ids+=(${index_module_map[$index]})
done
visible_modules=$( IFS=,; echo "${module_ids[*]}")
echo HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}
echo HABANA_VISIBLE_MODULES=${visible_modules}
else
visible_modules="0,1,2,3,4,5,6,7"
fi
container_existing=$(docker ps -a --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}') container_running=$(docker ps --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}')
if [[ "$1" == "1" ]] || [[ -z "$container_existing" ]]; then
if [ ! -z "$container_existing" ]; then
echo "Error: Container ${CONTAINER_NAME} exists. Remove the existing container first."
exit -1
fi
docker run --runtime=habana --name ${CONTAINER_NAME} -td \
-e HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES} \
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
--cap-add=sys_nice --net=host --ipc=host \
--env http_proxy=${http_proxy} \
--env https_proxy=${https_proxy} \
--env no_proxy=${no_proxy} \
--env HF_HOME=${DATA_DIR}/huggingface \
--env DATA_DIR=${DATA_DIR} \
--env WORKSPACE_ROOT=/workspace \
--env HABANA_VISIBLE_MODULES=${visible_modules} \
--env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \
--env PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
--env PT_HPUGRAPH_DISABLE_TENSOR_CACHE=1 \
--env VLLM_GRAPH_RESERVED_MEM=0.6 \
--env VLLM_GRAPH_PROMPT_RATIO=0 \
--env VLLM_DECODE_BLOCK_BUCKET_MAX=2048 \
--env VLLM_PROMPT_BS_BUCKET_STEP=128 \
--env VLLM_PROMPT_BS_BUCKET_MAX=256 \
--volume pwd
:/workspace \
--volume ${DATA_DIR}:${DATA_DIR} \
--name ${CONTAINER_NAME} \
${DOCKER_IMAGE} bash
Your current environment
🐛 Describe the bug
seq_group_metadata_list.extend( self.create_dummy_seq_group_metadata(0, 0, isprompt) for in range(batch_size_padding))
this piece of code introduces metadata certation in loop, and observe 10% perf drop. is this code change intentional?