Closed yuliu-eric closed 1 year ago
Are you reinstantiating the client on every request?
If so, could you move it outside your handler, like so?
chat = Anthropic(api_key=api_key)
@app.get("/")
async def my_handler():
completion = chat.completions.create(
model="claude-2",
max_tokens_to_sample=20,
prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
)
return completion
If the API key changes per request, you might consider a with
block:
@app.get("/")
async def my_handler():
with Anthropic(api_key=api_key) as client:
completion = chat.completions.create(
model="claude-2",
max_tokens_to_sample=20,
prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
)
return completion
Do either of those work for you?
In summary, I've tried all the approaches as mentioned above, but unfortunately still got the same memory issue. Would be great if you can help to reproduce the same issue on your side.
@yuliu-eric can you share a minimal reproduction? I can't fully reproduce what you're seeing.
Running this script:
import psutil
import anthropic
from anthropic import Anthropic
base_url = "http://127.0.0.1:4010" # local mock server
for _ in range(1000):
print(f"total mem usage: {psutil.virtual_memory().percent}")
prompt = "\n\nHuman: This is a test prompt. Return some randome text\n\nAssistant:"
client = Anthropic(base_url=base_url)
client.completions.create(
prompt=prompt,
stop_sequences=[anthropic.HUMAN_PROMPT],
model="claude-instant-1.2",
max_tokens_to_sample=100,
)
Results in a total memory usage increase of 54.7% -> 55.5% and eventually results in this error
OSError: [Errno 24] Too many open files
Exception ignored in: <function Anthropic.__del__ at 0x107569dc8>
Traceback (most recent call last):
File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_client.py", line 224, in __del__
File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_base_client.py", line 691, in close
AttributeError: 'Anthropic' object has no attribute '_client'
However this would be considered a bug in the user-code, you shouldn't be instantiating a client in a loop like this and if you really need to you should call client.close()
or use the context manager that was previously mentioned.
For example:
client.completions.create(
prompt=prompt,
stop_sequences=[anthropic.HUMAN_PROMPT],
model="claude-instant-1.2",
max_tokens_to_sample=100,
)
client.close()
This results in memory usage steadily increasing from 56.3% to a peak of 58% where it then sharply drops back down to 56.3%, this is presumably because the gc hasn't been able to free up the created client instances until the script is nearly finished running.
@yuliu-eric could you make a modification to the script above that reproduces the error?
@RobertCraigie @rattrayalex Thank you so much for the effort!
However, I think there might be a misunderstanding here, let me try to rephrase it and below is our scenario:
We observed that the server S's memory kept increasing in the above case, but no problem when replacing Anthropic python calls with http calls in S.
I have attached a diagram to help demonstrate the problem.
Btw, I've tried to add client.close()
in our python server code after each call to Anthropic, but got the same memory issue.
Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?
Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?
I will test tonight and let you know the reult .
FYI I'm also getting a resource leak when using the Anthropic client where open file count climbed monotonically due to connections to *.googleusercontent.com
. In our case it turned out to be because of code like this (slightly cursed code, sorry):
def make_prediction():
limits = httpx.Limits(max_connections=1, max_keepalive_connections=1)
with closing(Anthropic(connection_pool_limits=limits)) as anthropic:
completion = anthropic.with_options(max_retries=12).completions.create(
model=some_model_name,
prompt=some_prompt,
temperature=temperature,
max_tokens_to_sample=max_length_tokens,
)
return completion.completion
We didn't realize that with_options
was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options()
(and move max_retries
into the Anthropic()
constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()
/.copy()
, but I also only spent a minute on this.
(Also IMO using __del__
to clean up open files is a massive footgun and the client should probably be a context manager instead; I've had to deal with so much deadlocked multiprocessing code where things break because they rely on __del__
to be called as soon as the object falls out of scope and in the right order relative to other finalizers, which often does not happen. PyTorch dataloaders are the absolute worst offender for this.)
Yes I agree that __del__
is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close()
themselves when not using a context manager.
the client should probably be a context manager instead;
The client does support being used as a context manager, did this not work for you or were you unaware?
I've also just realised that we don't document the client as being a context manager or .close()
in the README but we probably should!
We didn't realize that with_options was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options() (and move max_retries into the Anthropic() constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()/.copy(), but I also only spent a minute on this.
This is helpful context, thanks!
It is very surprising that those clients are not being cleaned up... will look into it.
Creating a new client on every with_options()
call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx
works, passing certain options such as connection_pool_limits
will require that a new httpx
instance is created. However in your example max_retries
wouldn't require a new client.
completion = anthropic.with_options(max_retries=12).completions.create(
Out of curiosity is there a reason you set max_retries
to 12
? Have you ran into instability in the SDK / API?
Yes I agree that
__del__
is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close() themselves when not using a context manager The client does support being used as a context manager, did this not work for you or were you unaware?
Thanks for clarifying, I totally missed the context manager support. I took a quick look at the Anthropic
class in vscode and didn't see the context manager methods, but I should have checked the base class as well. Using it as a context manager works great :+1:
Creating a new client on every with_options() call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx works, passing certain options such as connection_pool_limits will require that a new httpx instance is created. However in your example max_retries wouldn't require a new client.
FWIW our use case didn't truly need to make use of this method & it was easy to move the extra option up to the place where the client is constructed. If it wasn't for the GC issue then I don't think anyone would have noticed (efficiency is not much of a concern for the batch job we're running).
Out of curiosity is there a reason you set max_retries to 12? Have you ran into instability in the SDK / API?
No, we're just doing something hacky to parallelize our requests (using a pool of 10 threads), which leads to lots of rate limit issues. If we did things properly with asyncio then this would not have been a problem.
I don't think we've been able to repro a memory leak, so I'm going to close this, but we do plan to investigate resource leaks more deeply and we'd still appreciate a code snippet that can reproduce an issue!
Thank you for taking the effort look into this! I'll try to provide a minimal working snippet later to reproduce the problem.
This might've been fixed by https://github.com/anthropics/anthropic-sdk-python/commit/5e51ebdbc6e5c23c8c237b5e0231ef66f585f964.
We have a server implemented with FastAPI to cal Anthropic through Python, but when run the following experiments, the memory kept increasing and not released after stop sending request to the server.
Sample server code logic
When I replaced the Anthropic python sdk with httpx request directly, the memory kept low and stable during the same experiment.
Same issue when using 0.3.11 as well.
Would be great if you can help to take a look.