brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.52k stars 2.27k forks source link

Give Leo information about when text is being truncated so it can better formulate an answer. #33006

Closed bbondy closed 9 months ago

bbondy commented 12 months ago

Could we do better at making the model aware that it didn’t consume the full content? I didn’t verify but I’m assuming it stopped at 13 because the text got truncated? https://music.youtube.com/playlist?list=OLAK5uy_lm33KX5L-19O27gjy9IrZCQJEFxx00WMQ

Screenshot 2023-09-13 at 4 05 41 PM

petemill commented 11 months ago

Is it enough to make the user aware it's cut off? https://github.com/brave/brave-browser/issues/31405

bbondy commented 11 months ago

For MVP? I think so

stevelaskaridis commented 9 months ago

After some testing, here are my findings:

List of links tested

Methodology

Tested summary plus a few questions about the context inside and outside of scope.

Prompt changes tested:

(changes signified in bold)

Models checked

Results

Mode 1 - Prepend

Works okay, does not make much of a difference, unless you explicitly ask if the whole content was consumed. 13B model does not answer that correctly (assumes the whole article was read).

Mode 2 - Append

Works okay. Stronger conditioning that the whole output was not consumed. Sometimes, it makes the model not to respond properly.

Mode 3 - Seed

This did not work at all. It affects the task completion significantly.

Recommendation

Making the prompt truncation aware does not monotonically change the behaviour towards better responses. However, under some queries, it may yield that the whole content was not consumed instead of e.g. the cutoff point of training or lack of access to real information. The downside is that I also saw the behaviour of refusing to respond due to "ethics, etc.".

For Claude-Instant, I did not test thoroughly as the context limit is quite high anyways.

If we are to integrated it, I would vote for Mode 2. I can push the changes made. QA can further test more wide into the impact of this prompt change.

bbondy commented 9 months ago

An option is to just close this with that investigation too. Do you recommend that or would you prefer Mode 2?

bbondy commented 9 months ago

Keeping in mind that we do warn the user about it now even if the model doesn't know.

stevelaskaridis commented 9 months ago

Given that Llama-13B does not particularly "care" about this prompt change anyways (which is our public offering), I propose that we close this for now.

bbondy commented 9 months ago

thanks for checking 👍