Running issue with ChatBumblebee & Llama2

slashmili commented 3 months ago

I tried to use ChatBumblebee but it didn't work as expected

This is the livebook instruction I used:

https://gist.github.com/slashmili/ba0ac06a6346e793e357caf940a8a424

When I run the chain, I got lost of warnings :

13:55:43.944 [warning] Streaming call requested but no callback function was given.

And the answer was not as I expected:

COMBINED DELTA MESSAGE RESPONSE: %LangChain.Message{
  content: "I'm just an AI, I don't have access to your personal belongings or the layout of your home, so I cannot accurately locate your hairbrush. However, I can provide you with some general information about where hairbrushes are typically kept in a typical home.\n\nIn many households, hairbrushes are usually stored in a bathroom cabinet or on a bathroom countertop. Some people may also keep their hairbrushes in a dresser drawer or in a designated hair accessory case.\n\nIf you're having trouble finding your hairbrush, you might want to check these locations first. If you're still unable to find it, you could try asking other members of your household if they've seen it or check underneath your bed or in your closet.",
  index: nil,
  status: :complete,
  role: :assistant,
  function_name: nil,
  arguments: nil
}

seems like that it couldn't use Llama to find the right function.

Any idea what did I do wrong?

brainlid commented 3 months ago

Nice! Thanks for trying it out. A couple things:

ChatBumblebee defaults stream: true because if you ever want to stream, the model needs to be setup to stream and it will always stream. Confused? Just set stream: false and the warnings all go away. Basically, it invisibly defaulted to streaming and no callback function to receive the streamed data was provided.
Llama2 is not trained on handling functions. Bumblebee does not have a way to constrain the generated text to be JSON, which is needed for implementing function calls. In short, it doesn't support functions.

You can use Llama2 just fine for conversation though.

slashmili commented 3 months ago

Thanks for the clarification.

about stream: true, the doc says it should be set as true, do we need to change the doc?
Is there any LLMs on you know that is hosted on Huggingface and is trained for handling functions?

I know that ChatBumblebee is a new feature(which I'm really excited about) but didn't know about these limitations. Do you think if we can improve the ChatBumblebee doc to mention what it's capable of? also one full example of how to use ChatBumblebee would be nice.

Otherwise if you feel it's too early to spend time on ChatBumblebee, I totally understand.

brainlid commented 3 months ago

Ah, yes. Streaming does work, it just requires the callback function. But, as you saw, it still works and returns the message, but warns that the callback function is missing.

Yes, I need to create a Livebook demo of using it to make it all clear.

There are models on HuggingFace that are fine-tuned for function calling. I've tried working with this one: https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling among others.

The problem is, Bumblebee isn't ready to support functions in LLMs. The reason is, Bumblebee needs to constrain the output to enforce valid JSON-only responses, which it doesn't do yet. So it fails for functions at this time.

The way to get around this is to host the model in LlamaCPP or Ollama (which I think uses Llamacpp under the hood) to force the JSON constraint.

At this time, functions cannot will not work directly in Bumblebee.

brainlid commented 3 months ago

For your example, just add something like this:

    callback_fn = fn
      %LangChain.MessageDelta{} = delta ->
        # write to the console as the response is streamed back
        IO.write(delta.content)

      %LangChain.Message{} = message ->
        # inspect the fully finished message that was assembled from all the deltas
        IO.inspect(message, label: "FULLY ASSEMBLED MESSAGE")
    end

Then in your function call to execute the chain, add the callback like this:

|> LLMChain.run(while_needs_response: true, callback_fn: callback_fn)

That removes the warnings and displays the live streamed response data.

brainlid commented 3 months ago

I added a full Bumblebee example to the docs. Hopefully that will help others. Thanks for the feedback on the gap.

https://hexdocs.pm/langchain/LangChain.ChatModels.ChatBumblebee.html#module-full-example-of-chat-through-bumblebee

slashmili commented 3 months ago

Thanks 🙇🏼

brainlid / langchain

Running issue with ChatBumblebee & Llama2 #84