Further streamline workflows with granular overrides

This is my (imo) elegant way to handle all new CLI flags or API parameters in the future. Since I have defined Schemas for all different RAG, Chat, and Hyperparameter settings, I want to make an easier way to override any of the values in settings.json. Essentially, it would be nice to quickly run python chat.py -m "model-name" or python chat.py -t 500 to limit the response tokens to 500. Adding all these settings adds bulk that I want to streamline and plan for beforehand.

For instance, compared to the start of this project, there's now a ton of configuration settings!

Current settings.json file:

{
    "rag_config": {
        "rag_mode": bool, (To start the chatbot in rag mode)
        "collection_name": str (name of the vector-db folder),
        "embedding_model": str (Name of the function in models.py),
        "method": str ("faiss" or "chroma"),
        "chunk_size": int (max characters in each chunk in the db, usually 256-2000),
        "chunk_overlap": int (usually 20-200 characters of overlap),
        "k_excerpts": int (excerpts used as context for LLM response, usually 1-4),
        "rag_llm": str (Name of the function in models.py),
        "inputs": [
            "paper.txt", "agents.pdf", "example.com" (each input is a string pointing to a local file or a url)
        ],
        "multivector_enabled": false, (This is for advanced workflows - very cool! :P)
        "multivector_method": "summary"
    },
    "chat_config": {
        "primary_model": "get_together_llama3",
        "backup_model": "get_ollama_local_model",
        "enable_system_message": true,
        "system_message": "You are a helpful AI." (This can be also be set in-chat, but adds friction)
    },
    "hyperparameters": {
        "max_tokens": 1000,
        "temperature": 0.1
    }
}

Note that to add things like custom stop tokens or temperatures for certain substeps, I run into an annoying problem with how this is organized. My plan is to create a simple schema for hyperparameters that can be configured and used flexibly by any of the LLMs.

I also want to create a new function to save a custom configuration as a file like settings.json that can be run in the future. I would like it to be something akin to the Vectara Ingest crawlers with YAML configs for running a workflow.

Perhaps in the future, a YAML may contain:

The values currently in settings.json
Each LLM reference is paired with a hyperparams config, so you can have different temperature, max_tokens, across the different LLM components of a workflow. You would have (LLM, hyperparams) for each instance, defaulting to what it's like right now, but allowing flexibility for complex workflows.
Document processing config (fast vs. high quality pdf processing, web scraping settings)
Workflow configuration (setting up EVAL chains or values for populating external database)

This last step would be very powerful. Setting up a YAML config with 1000+ websites or documents in the input field, and having it being able to create a database with meaningful metadata or analysis is very tricky to streamline, but immensely valuable!

TanGentleman / Augmenta

Further streamline workflows with granular overrides #29