Note: This project is a work in progress.
We will create a chatbot with which a person can interact to play a simple trivia game. The bot will use an established lightweight LLM model and reference a large dataset of questions via Retrieval-Augmented Generation (RAG). We’ll include agentful workflows programmed via prompt engineering and chaining with LangChain.
We will use at least one dataset for RAG. 200,000 Jeopardy questions in a JSON file. Stretch: Include one or more additional datasets.
To better understand the data, we will explore it to determine its data types and existing features.
We want to have a small number of top-level categories. However, from our cursory review, trivia datasets often have vague and unique categories. We will likely need to use some NLP tools and TF-IDF with clustering to identify the ideal number of categories and then manually name them.
We will use a base light-weight LLM model, such as Phi 3 from Ollama, and RAG with our trivia dataset.
We will create a web frontend allowing users to interact with the chatbot. This can be Gradio or a simple custom React app. In either case, the user should be able to run the application locally.