Open AmberLJC opened 9 months ago
Please share how to reproduce the issue and what is expected to observe.
Yes, I tried to serve Llama-2 13B on A40 with 46GB. After I start triton server. I start multiple clients, each with one prompt. There is 0.5s between two client process, like this script (I skip the tokenizer_dir and other model parameter)
iterations=100
# Loop for the specified number of iterations
for ((i = 1; i <= $iterations; i++)); do
# Run the Python script with the specified arguments
python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py \
--request-output-len 1024 \
--text "What is the average lifespan of a Llama?" \
--request_id $i
sleep 0.5
done
The response is good initially, but become shorter when more requests jump in. Here is the logging.
{
"timstamp": "2023-12-13 23:09:45.649551",
"request_id": "7",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
"output_len": 58
}
{
"timstamp": "2023-12-13 23:09:47.531393",
"request_id": "8",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
"output_len": 58
}
{
"timstamp": "2023-12-13 23:09:48.672811",
"request_id": "9",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
"output_len": 58
}
{
"timstamp": "2023-12-13 23:09:49.771779",
"request_id": "10",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
"output_len": 58
}
{
"timstamp": "2023-12-13 23:09:50.031199",
"request_id": "11",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
"output_len": 58
}
{
"timstamp": "2023-12-13 23:09:51.193665",
"request_id": "16",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama",
"output_len": 11
}
{
"timstamp": "2023-12-13 23:09:51.210134",
"request_id": "15",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is ",
"output_len": 13
}
{
"timstamp": "2023-12-13 23:09:51.210171",
"request_id": "13",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to",
"output_len": 16
}
{
"timstamp": "2023-12-13 23:09:51.216022",
"request_id": "12",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Fact",
"output_len": 35
}
{
"timstamp": "2023-12-13 23:09:51.259440",
"request_id": "17",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of",
"output_len": 8
}
{
"timstamp": "2023-12-13 23:09:51.305968",
"request_id": "14",
"input": "What is the average lifespan of a Llama?",
"response": "\n\nThe average lifespan of a llama is 1",
"output_len": 14
}
{
"timstamp": "2023-12-13 23:09:51.324125",
"request_id": "18",
"input": "What is the average lifespan of a Llama?",
"response": "\n",
"output_len": 1
}
{
"timstamp": "2023-12-13 23:09:51.840256",
"request_id": "19",
"input": "What is the average lifespan of a Llama?",
"response": "\n",
"output_len": 1
}
I've been conducting a stress test on your serving system to evaluate its performance under high load. However, I've encountered an issue where the response length of each request becomes progressively shorter as the load increases. This results in incomplete responses being returned by the system. My setup is having one request arrives every 0.5 second. Is there any parameter I need to adjust to avoid this happen? Thanks!