Redesigns the app around FastAPI (the auto reload feature on docker doesn't work so it was making the dev loop extremely slow. Moving to FastAPI has a whole host of other advantages).
Consolidates all LLM operations to use llama3.2:3b for consistency
Updates all of the builds to pull llama3.2:3b
Removes code around downloading models / code related to functionary
Sets up a realtime search agent that does NOT require any API key set up (scrapes google search requests for the results and falls back to headless browsing if this fails, then synthesizes the results using an LLM)
Functionality for the user to one-click clear their chat history
Tackles the following issues: