Closed bgokden closed 1 year ago
Thanks for this issue @bgokden. @ishaan-jaff assigning to you as you worked on the petals integration.
@bgokden can you share the code snippet you're using, so we can repro this bug?
just pushed a fix to remove cuda from petals: https://github.com/BerriAI/litellm/commit/b81f8d2ddd3e78ec5cd500a8897b2c0a6ab6c197
tested on dev and it worked fine. I'm unable to repro the problem @bgokden was facing but checked how langchain implemented petals: https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/petals.py
Pushed a new fix, it's on the latest litellm version, litellm==0.1.715
@bgokden can you try and let us know if it worked ?
@ishaan-jaff Yes I will try:
I have an agent similar to here https://python.langchain.com/docs/modules/agents/agent_types/structured_chat#adding-in-memory
This is how I call LiteLLM with Petals.
llm = ChatLiteLLM(temperature=0.2, model="petals/petals-team/StableBeluga2", verbose=True)
this is how I call Petals from LangChain
llm = Petals(temperature=0.2, model_name="petals-team/StableBeluga2", verbose=True)
I think it works now. Thanks for quick response. It passes the previous part, but the Petals is seriously slow sometimes.
Yep - that's why we commented out petals - it kept causing our testing to fail due to request timeouts.
What happened?
So I tried the new Petals integration and I got an error that pytorch is not compiled with CUDA enabled. I am using LangChain and the Petals package inside LangChain works without any problem. I am working with a 2020 Intel MacBook Pro without a CUDA-enabled GPU but everything works with
accelerate
package from Hugging Face (https://pypi.org/project/accelerate/) , just slower.Relevant log output
Twitter / LinkedIn details
https://www.linkedin.com/in/berkgokden/