Fix #78: Llama-2 query engine

Implement llama-cpp model in query engine.

Needed to update llama-index to latest

For example, after setting up your slack tokens (see README), you should be able to load up an instance (using the handbook data):

python slack_bot/run.py --model llama-index-llama-cpp --data data --which-index handbook --n-gpu-layers 1 --model-name gguf_models/llama-2-13b-chat.Q6_K.gguf --path

given that you're in the same directory as llama-2-13b-chat.Q6_K.gguf. Note the --path to signify the model name is a path. Equivalently, you could have:

python slack_bot/run.py -m llama-index-llama-cpp -d data -w handbook -ngl 1 -n https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf

This just downloads it straight from the URL instead.

alan-turing-institute / reginald

Fix #78: Llama-2 query engine #79