Incomplete Responses During High Load Stress Test

AmberLJC commented 9 months ago

I've been conducting a stress test on your serving system to evaluate its performance under high load. However, I've encountered an issue where the response length of each request becomes progressively shorter as the load increases. This results in incomplete responses being returned by the system. My setup is having one request arrives every 0.5 second. Is there any parameter I need to adjust to avoid this happen? Thanks!

byshiue commented 9 months ago

Please share how to reproduce the issue and what is expected to observe.

AmberLJC commented 9 months ago

Yes, I tried to serve Llama-2 13B on A40 with 46GB. After I start triton server. I start multiple clients, each with one prompt. There is 0.5s between two client process, like this script (I skip the tokenizer_dir and other model parameter)


iterations=100
# Loop for the specified number of iterations
for ((i = 1; i <= $iterations; i++)); do
    # Run the Python script with the specified arguments
    python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py \
    --request-output-len 1024 \
    --text "What is the average lifespan of a Llama?" \
    --request_id $i 
    sleep 0.5
done

The response is good initially, but become shorter when more requests jump in. Here is the logging.

{
    "timstamp": "2023-12-13 23:09:45.649551",
    "request_id": "7",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
    "output_len": 58
}
{
    "timstamp": "2023-12-13 23:09:47.531393",
    "request_id": "8",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
    "output_len": 58
}
{
    "timstamp": "2023-12-13 23:09:48.672811",
    "request_id": "9",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
    "output_len": 58
}
{
    "timstamp": "2023-12-13 23:09:49.771779",
    "request_id": "10",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
    "output_len": 58
}
{
    "timstamp": "2023-12-13 23:09:50.031199",
    "request_id": "11",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Factors such as breed, living conditions, and health can affect a llama's lifespan.",
    "output_len": 58
}
{
    "timstamp": "2023-12-13 23:09:51.193665",
    "request_id": "16",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama",
    "output_len": 11
}
{
    "timstamp": "2023-12-13 23:09:51.210134",
    "request_id": "15",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is ",
    "output_len": 13
}
{
    "timstamp": "2023-12-13 23:09:51.210171",
    "request_id": "13",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to",
    "output_len": 16
}
{
    "timstamp": "2023-12-13 23:09:51.216022",
    "request_id": "12",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 15 to 25 years, with some individuals living up to 30 years or more. Fact",
    "output_len": 35
}
{
    "timstamp": "2023-12-13 23:09:51.259440",
    "request_id": "17",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of",
    "output_len": 8
}
{
    "timstamp": "2023-12-13 23:09:51.305968",
    "request_id": "14",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n\nThe average lifespan of a llama is 1",
    "output_len": 14
}
{
    "timstamp": "2023-12-13 23:09:51.324125",
    "request_id": "18",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n",
    "output_len": 1
}
{
    "timstamp": "2023-12-13 23:09:51.840256",
    "request_id": "19",
    "input": "What is the average lifespan of a Llama?",
    "response": "\n",
    "output_len": 1
}

NVIDIA / TensorRT-LLM

Incomplete Responses During High Load Stress Test #655