Closed slashmili closed 3 months ago
Nice! Thanks for trying it out. A couple things:
ChatBumblebee defaults stream: true
because if you ever want to stream, the model needs to be setup to stream and it will always stream. Confused? Just set stream: false
and the warnings all go away. Basically, it invisibly defaulted to streaming and no callback function to receive the streamed data was provided.
Llama2 is not trained on handling functions. Bumblebee does not have a way to constrain the generated text to be JSON, which is needed for implementing function calls. In short, it doesn't support functions.
You can use Llama2 just fine for conversation though.
Thanks for the clarification.
stream: true
, the doc says it should be set as true
, do we need to change the doc?I know that ChatBumblebee is a new feature(which I'm really excited about) but didn't know about these limitations. Do you think if we can improve the ChatBumblebee doc to mention what it's capable of? also one full example of how to use ChatBumblebee
would be nice.
Otherwise if you feel it's too early to spend time on ChatBumblebee, I totally understand.
Ah, yes. Streaming does work, it just requires the callback function. But, as you saw, it still works and returns the message, but warns that the callback function is missing.
Yes, I need to create a Livebook demo of using it to make it all clear.
There are models on HuggingFace that are fine-tuned for function calling. I've tried working with this one: https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling among others.
The problem is, Bumblebee isn't ready to support functions in LLMs. The reason is, Bumblebee needs to constrain the output to enforce valid JSON-only responses, which it doesn't do yet. So it fails for functions at this time.
The way to get around this is to host the model in LlamaCPP or Ollama (which I think uses Llamacpp under the hood) to force the JSON constraint.
At this time, functions cannot will not work directly in Bumblebee.
For your example, just add something like this:
callback_fn = fn
%LangChain.MessageDelta{} = delta ->
# write to the console as the response is streamed back
IO.write(delta.content)
%LangChain.Message{} = message ->
# inspect the fully finished message that was assembled from all the deltas
IO.inspect(message, label: "FULLY ASSEMBLED MESSAGE")
end
Then in your function call to execute the chain, add the callback like this:
|> LLMChain.run(while_needs_response: true, callback_fn: callback_fn)
That removes the warnings and displays the live streamed response data.
I added a full Bumblebee example to the docs. Hopefully that will help others. Thanks for the feedback on the gap.
Thanks 🙇🏼
I tried to use ChatBumblebee but it didn't work as expected
This is the livebook instruction I used:
https://gist.github.com/slashmili/ba0ac06a6346e793e357caf940a8a424
When I run the chain, I got lost of warnings :
And the answer was not as I expected:
seems like that it couldn't use Llama to find the right function.
Any idea what did I do wrong?