[Bug]: Petals requires Cuda Enabled PyTorch.

bgokden commented 1 year ago

What happened?

So I tried the new Petals integration and I got an error that pytorch is not compiled with CUDA enabled. I am using LangChain and the Petals package inside LangChain works without any problem. I am working with a 2020 Intel MacBook Pro without a CUDA-enabled GPU but everything works with accelerate package from Hugging Face (https://pypi.org/project/accelerate/) , just slower.

  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Relevant log output

File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/litellm/main.py", line 985, in completion
    model_response = petals.completion(
                     ^^^^^^^^^^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/litellm/llms/petals.py", line 42, in completion
    model_obj = model_obj.cuda()
                ^^^^^^^^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2054, in cuda
    return super().cuda(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
                                 ^^^^^^^^^^^^^^
  File "/Users/berk/repos/gpt-bot/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Twitter / LinkedIn details

https://www.linkedin.com/in/berkgokden/

krrishdholakia commented 1 year ago

Thanks for this issue @bgokden. @ishaan-jaff assigning to you as you worked on the petals integration.

krrishdholakia commented 1 year ago

@bgokden can you share the code snippet you're using, so we can repro this bug?

ishaan-jaff commented 1 year ago

just pushed a fix to remove cuda from petals: https://github.com/BerriAI/litellm/commit/b81f8d2ddd3e78ec5cd500a8897b2c0a6ab6c197

tested on dev and it worked fine. I'm unable to repro the problem @bgokden was facing but checked how langchain implemented petals: https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/petals.py

ishaan-jaff commented 1 year ago

Pushed a new fix, it's on the latest litellm version, litellm==0.1.715

@bgokden can you try and let us know if it worked ?

bgokden commented 1 year ago

@ishaan-jaff Yes I will try:

I have an agent similar to here https://python.langchain.com/docs/modules/agents/agent_types/structured_chat#adding-in-memory

This is how I call LiteLLM with Petals.

    llm = ChatLiteLLM(temperature=0.2, model="petals/petals-team/StableBeluga2", verbose=True)

this is how I call Petals from LangChain

    llm = Petals(temperature=0.2, model_name="petals-team/StableBeluga2", verbose=True)

bgokden commented 1 year ago

I think it works now. Thanks for quick response. It passes the previous part, but the Petals is seriously slow sometimes.

krrishdholakia commented 1 year ago

Yep - that's why we commented out petals - it kept causing our testing to fail due to request timeouts.

BerriAI / litellm