alan-turing-institute / reginald

Reginald repository for REG Hack Week 23
3 stars 0 forks source link

Chatbot that remembers conversation history #64

Closed rchan26 closed 1 year ago

rchan26 commented 1 year ago

In the current llama-index model, the conversation history is not tracked and so each question also queries the database for an answer. It would be interesting to investigate how we can have a conversation with the data (multiple back-and-forth instead of a single question and answer).

Looking at the llama-index documentation, it looks like it has some ability to do this: https://gpt-index.readthedocs.io/en/latest/core_modules/query_modules/chat_engines/root.html

Would need to replace the query_engine calls with chat_engine. Would also need to play around with something like the ReAct Agent (llama-index have a few implemented) which decides how the chatbot will interact with the database during the conversation.

rwood-97 commented 1 year ago

See here for examples of using chat engine. The 'condense_question' and 'context' modes seems to be the ones which forces llama2 to use the query engine (i.e. the database data vs just pre-trained/pre-existing knowledge).

rwood-97 commented 1 year ago

Have played around with this a bit more in a new notebook here.

I think 'context' basically finds/retrieves a load of context info from our database and then uses that to answer the question (i.e. the model is called once per 'chat' and its essentially "heres a load of context, can you answer this"). The 'condense_question' seems to be more like just using the query engine (i.e. the model is called multiple times, once for each piece of context). For the first query I think it is basically just the same as query engine. But then if you follow up, it would be 'condensing' your follow up question with the chat history and then using that as the new query for the query engine.

Overall, I think 'context' mode seems better.

rchan26 commented 1 year ago

Some examples of using the chat engine from #66. Will continue with using the "context" engine as it seems the most consistent engine.

React seems to be quite volatile and doesn't always makes the best decisions in determining whether or not to use the query engine. But it does note here that it really depends on the quality of the LLM. We do get better performance using 13b models over 7b (quantized), so perhaps could be better in the future if we have access to higher quality quantized LLMs.

While working on this, we noticed an issue with the prompt creation in the chat engine. This has been fixed in this PR by @rwood-97 and I.