API server returns empty completions after a few successful requests

viktor-ferenczi commented 10 months ago

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.

Describe the bug

The OpenAI compatible AI server returns empty completions after answering correctly for a few times.

In my latest test it responded correctly 15 times before it broke. After that it just immediately responds with a 200 OK and an empty completion. The GPU load goes back to zero, so it does not do anything. It can be resolved only by restarting the server, which makes it unusable. My expectation would be that the server handles as many conversations as its batch size allows in parallel and hold up the rest of HTTP connections until a free slot is available.

The server does not print any errors on the console and there is no verbose or debug flag either.

Reproduction

Deploy any Llama2 or derived model, then start the OpenAI API server.

For example in my case of two 4090 GPUs:

python -m lmdeploy.serve.turbomind.deploy llama2 jondurbin/airoboros-l2-13b-gpt4-2.0 --model_format=hf --tp=2
python -m lmdeploy.serve.openai.api_server --tp=2 --server_name=0.0.0.0 ./workspace

Python script to drive the server via the OpenAI async client protocol:

import asyncio
import random

import openai
from aiohttp import ClientSession

openai.api_base = 'http://127.0.0.1:23333/v1'
openai.api_key = 'xxx'

test_prompts = [
    "Describe the lifecycle of a butterfly.",
    "How do magnets work?",
    "Explain the theory of relativity in simple terms.",
    "What's the significance of the Mona Lisa?",
    "How does photosynthesis work?",
    "Compare and contrast Shakespeare's sonnets.",
    "Why is the sky blue during daytime?",
    "Discuss the importance of the Gutenberg press.",
    "What's the difference between mitosis and meiosis?",
    "How did the pyramids of Egypt get built?",
    "Summarize the plot of 'Moby Dick'.",
    "Describe the key principles of Renaissance art.",
    "How do volcanoes form?",
    "Explain the water cycle.",
    "Why do we dream?",
    "How does the internet work?",
    "Discuss the impact of the industrial revolution.",
    "What causes tides?",
    "Describe the events leading up to World War I.",
    "How does a microwave oven work?",
    "Why do we have leap years?",
    "Explain the concept of black holes.",
    "How are rainbows formed?",
    "Discuss the cultural impact of The Beatles.",
    "What's the role of mitochondria in a cell?",
    "Why is biodiversity important?",
    "Describe the process of fermentation.",
    "How did the Roman Empire fall?",
    "Explain the basics of quantum mechanics.",
    "What are the benefits of reading literature?",
    "How do planes fly?",
    "Why is gold valuable?",
    "Discuss the main tenets of Buddhism.",
    "How does the human eye work?",
    "Explain the process of nuclear fusion.",
    "What are the main causes of global warming?",
    "Describe the plot of 'Pride and Prejudice'.",
    "What's the difference between acids and bases?",
    "How are pearls formed?",
    "Discuss the impact of social media on society.",
    "Why do cats purr?",
    "Explain the concept of supply and demand.",
    "How does a compass work?",
    "What's the significance of the Eiffel Tower?",
    "Describe the history of the English language.",
    "Why do apples turn brown when cut?",
    "How were the Grand Canyons formed?",
    "Explain the principles of democracy.",
    "What are the pros and cons of nuclear energy?",
    "How does a bicycle stay upright?",
    "Discuss the themes in 'To Kill a Mockingbird'.",
    "What's the function of the heart in the human body?",
    "How does electricity get generated?",
    "Describe the moon's effect on Earth.",
    "What is the importance of the Amazon rainforest?",
    "Why do we get goosebumps?",
    "Explain the significance of the Magna Carta.",
    "How do clouds form?",
    "Discuss the legacy of Martin Luther King Jr.",
    "What's the difference between prokaryotic and eukaryotic cells?",
    "Why is the Dead Sea so salty?",
    "Describe the process of digestion.",
    "How did chocolate become popular worldwide?",
    "Explain the basics of artificial intelligence.",
    "Why do we have different time zones?",
    "What's the role of bees in an ecosystem?",
    "How does photosynthesis benefit animals?",
    "Discuss the cultural impact of jazz music.",
    "Explain the concept of gravity.",
    "Why do seasons change?",
    "Describe the symbolism in 'The Great Gatsby'.",
    "How do tsunamis occur?",
    "What are the key principles of communism?",
    "What's the importance of vaccination?",
    "Why does ice float on water?",
    "Explain how the printing press works.",
    "Discuss the history of tea and its global impact.",
    "How does the respiratory system function?",
    "Describe the origins of the Olympic Games.",
    "Why is recycling important?",
    "Explain the phenomenon of aurora borealis.",
    "What are the causes and effects of ozone depletion?",
    "Discuss the themes in 'Romeo and Juliet'.",
    "How do satellites orbit the Earth?",
    "Describe the formation of fossils.",
    "What's the role of the judiciary in a democracy?",
    "Why is the Great Wall of China significant?",
    "Explain the process of osmosis.",
    "What's the impact of the Silk Road in history?",
    "How does a computer's CPU work?",
    "Why do birds migrate?",
    "Discuss the impact of the French Revolution.",
    "What's the significance of Newton's three laws?",
    "Describe the history of the piano.",
    "How do we perceive colors?",
    "Explain the workings of a thermos.",
    "Why do dogs wag their tails?"
]

async def create_chat_completion(prompt: str, index=[1]):
    while 1:
        chat_completion_resp = await openai.ChatCompletion.acreate(model="llama2", messages=[{"role": "user", "content": prompt}])
        output = chat_completion_resp.choices[0].message.content
        if output and output.rstrip():
            break
        await asyncio.sleep(1.0)

    print(f'Q #{index[0]}: {prompt}')
    print(f'A #{index[0]}: {output.lstrip()}')
    print()

    index[0] += 1

    return output

async def main():
    session = ClientSession()
    openai.aiosession.set(session)

    tasks = [create_chat_completion(prompt) for prompt in test_prompts]
    pool = set()
    results = []
    while tasks or pool:
        if len(pool) < 32 and tasks:
            task = tasks.pop()
            pool.add(asyncio.create_task(task))
        done, pool = await asyncio.wait(pool, return_when=asyncio.FIRST_COMPLETED)
        for task in done:
            results.append(task.result())

    await openai.aiosession.get().close()

if __name__ == '__main__':
    asyncio.run(main())

Error traceback

No response

hatrexltd commented 10 months ago

Check https://github.com/InternLM/lmdeploy/issues/375

viktor-ferenczi commented 10 months ago

Thanks

viktor-ferenczi commented 10 months ago

I've tried the "fix" mentioned in #375

It results in CUDA runtime error: out of memory after handling only 6 conversations. (2x4090, 2x24GB VRAM, 13B model in 16 bits)

So far I cannot get lmDeploy to work reliably.

AllentDan commented 10 months ago

It is not a fix for your case. What you need is to set renew_session to True. https://github.com/InternLM/lmdeploy/blob/55764e0b33d8b9298f68b77484bab3832696c010/lmdeploy/serve/openai/api_server.py#L97 If you still want to use the above fix with random instance_id, you should set instance_num with a smaller value for example

python -m lmdeploy.serve.openai.api_server --instance_num 1 --tp=2 --server_name=0.0.0.0 ./workspace

github-actions[bot] commented 10 months ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 10 months ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

InternLM / lmdeploy