anthropics / anthropic-sdk-python

MIT License
1.49k stars 177 forks source link

Memory leak while using anthropic python sdk 0.3.10 #150

Closed yuliu-eric closed 1 year ago

yuliu-eric commented 1 year ago

We have a server implemented with FastAPI to cal Anthropic through Python, but when run the following experiments, the memory kept increasing and not released after stop sending request to the server.

Sample server code logic

chat = Anthropic(api_key=api_key)
completion = chat.completions.create(
                model="claude-2",
                max_tokens_to_sample=20,
                prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
            )
return completion

When I replaced the Anthropic python sdk with httpx request directly, the memory kept low and stable during the same experiment.

Same issue when using 0.3.11 as well.

Would be great if you can help to take a look.

rattrayalex commented 1 year ago

Are you reinstantiating the client on every request?

If so, could you move it outside your handler, like so?


chat = Anthropic(api_key=api_key)

@app.get("/")
async def my_handler():
  completion = chat.completions.create(
    model="claude-2",
    max_tokens_to_sample=20,
    prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
  )
  return completion

If the API key changes per request, you might consider a with block:

@app.get("/")
async def my_handler():
  with Anthropic(api_key=api_key) as client:
    completion = chat.completions.create(
      model="claude-2",
      max_tokens_to_sample=20,
      prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
    )
    return completion

Do either of those work for you?

yuliu-eric commented 1 year ago

In summary, I've tried all the approaches as mentioned above, but unfortunately still got the same memory issue. Would be great if you can help to reproduce the same issue on your side.

RobertCraigie commented 1 year ago

@yuliu-eric can you share a minimal reproduction? I can't fully reproduce what you're seeing.

Running this script:

import psutil
import anthropic
from anthropic import Anthropic

base_url = "http://127.0.0.1:4010"  # local mock server

for _ in range(1000):
    print(f"total mem usage: {psutil.virtual_memory().percent}")

    prompt = "\n\nHuman: This is a test prompt. Return some randome text\n\nAssistant:"
    client = Anthropic(base_url=base_url)
    client.completions.create(
        prompt=prompt,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-1.2",
        max_tokens_to_sample=100,
    )

Results in a total memory usage increase of 54.7% -> 55.5% and eventually results in this error

OSError: [Errno 24] Too many open files
Exception ignored in: <function Anthropic.__del__ at 0x107569dc8>
Traceback (most recent call last):
  File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_client.py", line 224, in __del__
  File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_base_client.py", line 691, in close
AttributeError: 'Anthropic' object has no attribute '_client'

However this would be considered a bug in the user-code, you shouldn't be instantiating a client in a loop like this and if you really need to you should call client.close() or use the context manager that was previously mentioned.

For example:

    client.completions.create(
        prompt=prompt,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-1.2",
        max_tokens_to_sample=100,
    )
    client.close()

This results in memory usage steadily increasing from 56.3% to a peak of 58% where it then sharply drops back down to 56.3%, this is presumably because the gc hasn't been able to free up the created client instances until the script is nearly finished running.

rattrayalex commented 1 year ago

@yuliu-eric could you make a modification to the script above that reproduces the error?

yuliu-eric commented 1 year ago

@RobertCraigie @rattrayalex Thank you so much for the effort!

However, I think there might be a misunderstanding here, let me try to rephrase it and below is our scenario:

  1. we have a python server (let's name it as S here) that communicates with Anthropic using python sdk
  2. we have client (let's name it as C) that sending 20 concurrent requests to S continuously, which in turn S sends python calls to Anthropic, processes the results from Anthropic, and returns data back to C

We observed that the server S's memory kept increasing in the above case, but no problem when replacing Anthropic python calls with http calls in S.

I have attached a diagram to help demonstrate the problem.

Screenshot 2023-09-14 at 11 01 45 AM

Btw, I've tried to add client.close() in our python server code after each call to Anthropic, but got the same memory issue.

rattrayalex commented 1 year ago

Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?

deter3 commented 1 year ago

Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?

I will test tonight and let you know the reult .

qxcv commented 1 year ago

FYI I'm also getting a resource leak when using the Anthropic client where open file count climbed monotonically due to connections to *.googleusercontent.com. In our case it turned out to be because of code like this (slightly cursed code, sorry):

def make_prediction():
        limits = httpx.Limits(max_connections=1, max_keepalive_connections=1)
        with closing(Anthropic(connection_pool_limits=limits)) as anthropic:
            completion = anthropic.with_options(max_retries=12).completions.create(
                model=some_model_name,
                prompt=some_prompt,
                temperature=temperature,
                max_tokens_to_sample=max_length_tokens,
            )
        return completion.completion

We didn't realize that with_options was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options() (and move max_retries into the Anthropic() constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()/.copy(), but I also only spent a minute on this.

(Also IMO using __del__ to clean up open files is a massive footgun and the client should probably be a context manager instead; I've had to deal with so much deadlocked multiprocessing code where things break because they rely on __del__ to be called as soon as the object falls out of scope and in the right order relative to other finalizers, which often does not happen. PyTorch dataloaders are the absolute worst offender for this.)

RobertCraigie commented 1 year ago

Yes I agree that __del__ is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close() themselves when not using a context manager.

the client should probably be a context manager instead;

The client does support being used as a context manager, did this not work for you or were you unaware?

I've also just realised that we don't document the client as being a context manager or .close() in the README but we probably should!

We didn't realize that with_options was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options() (and move max_retries into the Anthropic() constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()/.copy(), but I also only spent a minute on this.

This is helpful context, thanks!

It is very surprising that those clients are not being cleaned up... will look into it.

Creating a new client on every with_options() call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx works, passing certain options such as connection_pool_limits will require that a new httpx instance is created. However in your example max_retries wouldn't require a new client.

completion = anthropic.with_options(max_retries=12).completions.create(

Out of curiosity is there a reason you set max_retries to 12? Have you ran into instability in the SDK / API?

qxcv commented 1 year ago

Yes I agree that __del__ is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close() themselves when not using a context manager The client does support being used as a context manager, did this not work for you or were you unaware?

Thanks for clarifying, I totally missed the context manager support. I took a quick look at the Anthropic class in vscode and didn't see the context manager methods, but I should have checked the base class as well. Using it as a context manager works great :+1:

Creating a new client on every with_options() call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx works, passing certain options such as connection_pool_limits will require that a new httpx instance is created. However in your example max_retries wouldn't require a new client.

FWIW our use case didn't truly need to make use of this method & it was easy to move the extra option up to the place where the client is constructed. If it wasn't for the GC issue then I don't think anyone would have noticed (efficiency is not much of a concern for the batch job we're running).

Out of curiosity is there a reason you set max_retries to 12? Have you ran into instability in the SDK / API?

No, we're just doing something hacky to parallelize our requests (using a pool of 10 threads), which leads to lots of rate limit issues. If we did things properly with asyncio then this would not have been a problem.

rattrayalex commented 1 year ago

I don't think we've been able to repro a memory leak, so I'm going to close this, but we do plan to investigate resource leaks more deeply and we'd still appreciate a code snippet that can reproduce an issue!

yuliu-eric commented 1 year ago

Thank you for taking the effort look into this! I'll try to provide a minimal working snippet later to reproduce the problem.

tamird commented 11 months ago

This might've been fixed by https://github.com/anthropics/anthropic-sdk-python/commit/5e51ebdbc6e5c23c8c237b5e0231ef66f585f964.