[BUG]: Chat answer always starts with "According to the context provided" when system prompt says not to use that phrase

thebaldgeek commented 4 months ago

How are you running AnythingLLM?

Docker (local)

What happened?

Every single chat answer starts with the phrase "According to the context provided..." (or phrases along those lines). My team is so sick and tired of it its become a mocking point of the project.

I have tested many system prompts and am now of the belief that the prompt is totally ignored.

Using local Windows 11 Ollama Llama3.0, Docker Desktop (pulled 18 hours ago - cant find a version in docker). Default and built in database etc. Answers (beyond the starting phrase) are from the trained docs and are exactly as expected.

Are there known steps to reproduce?

Train the system on some docs and ask questions. Its a very simple process thanks to the amazing work the anything-llm team has done.

timothycarambat commented 4 months ago

Unfortunately, this is not really a "bug" and is simply a facet of LLM response generation. Not all models are equally good at what can be referred to as "instruction following". This obviously makes putting guardrails in a prompt to do something especially problematic with smaller, quantized, open source models - which are particularly impact by this.

Some things that have helped those trying to get a better prompt-following model but are only interested in OSS models:

Expanding hardware resource to allow large context window (if limited)
Swapping to the same model, but a higher bit quantization. Ollama default is 4-bit, the typical max is 8-bit. This becomes a substantial factor with smaller models.

Llama3 8BQ4 -> small model, medium compression -> overall pretty high compression, but super portable Llama3 70BQ4 -> massive model, medium compression -> overall pretty OK compression, but still resource intensive

I am assuming you are using LLama3 8B and the default quantization of Q4, I would try Q8 first and see if results improve

thebaldgeek commented 4 months ago

Currently using Llama3 70B.

I have tested dozens of prompts and there simply no impact on the responses. I am very very happy with all the answers and citations provided... Just having every. Single. Response. Start with the same phrase is watering down the overall project.

Sounds like there is no way to use the system prompt in this application. I will start looking at using the API and that way I can just remove the start of every message up to the first comma via the API code.

Thanks for all your work on this and the quick reply to this issue.

Mintplex-Labs / anything-llm