Consider methods to prevent hallucinations and incorrect answers

BoxcarsAI / boxcars

Building applications with composability using Boxcars with LLM's. Inspired by LangChain.

MIT License

434 stars 39 forks source link

Consider methods to prevent hallucinations and incorrect answers #89

Closed gottlike closed 2 months ago

gottlike commented 1 year ago

Just wanted to put this out here for your consideration: https://www.youtube.com/watch?v=LO3U5iqnTpk

A great and advanced summary of things that can go wrong with LLM search and ways of improving/safeguarding results. Might be valuable input for you guys.

gottlike commented 1 year ago

You are able to find the updated slides of this talk here: https://haystackconf.com/us2023/talk-18/

jaigouk commented 1 year ago

For prompts, maybe, we can let the enduser to define it instead of hard coding the template. or we use it as default fallback

there is also a tool from MS that we can reference https://github.com/microsoft/guidance

gottlike commented 1 year ago

There is also https://github.com/ShreyaR/guardrails and https://github.com/NVIDIA/NeMo-Guardrails as alternatives, currently. However, the general topic and how to best implement needs a lot of thought, imho. Ideally all the cases discussed in the talk could be configured in a separate "guardrail/guide" boxcar maybe. I'm a big fan of keeping it modular and those rules need to be customized anyway per use-case, but if Boxcars offers a nice framework/boxcar for this, that would be a win.

francis commented 1 year ago

Also, there is a very recent paper and some attempts to use "Chain of Thought" to reduces hallucinations. This might be a great second Train next to the ZeroShot Train.

gottlike commented 1 year ago

FYI: I like this always up-to-date ressource for prompt engineering: https://www.promptingguide.ai/

Also has a section for chain of thought: https://www.promptingguide.ai/techniques/cot

francis commented 2 months ago

Open AI just added "chain of thought" is their o1 model. Will add support for this.