LLM output stuck in a loop

atxcowboy commented 1 year ago

Using the current codebase, on the local LLM it seems to be stuck in a loop. After the 4th iteration the console output looks like this:

Ensure the response can be parsed by Python json.loads.
The current time and date is Wed Jun 28 08:49:58 2023
### Human: Determine which next tool to use, and respond using the format specified above:
### Assistant: { "thoughts": {...}, "tool": {...} }
### Assistant: { "thoughts": {...}, "tool": {...} }
### Assistant: { "thoughts": {...}, "tool": {...} }
### Assistant: { "thoughts": {...}, "tool": {...} }
### Assistant: { "thoughts": {...}, "tool": {...} }
### Human: Determine which next tool to use, and respond using the format specified above:
### Assistant:
--------------------

Llama.generate: prefix-match hit

llama_print_timings:        load time =  1362.62 ms
llama_print_timings:      sample time =     3.34 ms /    17 runs   (    0.20 ms per token)
llama_print_timings: prompt eval time = 11490.38 ms /   167 tokens (   68.80 ms per token)
llama_print_timings:        eval time =  3738.57 ms /    16 runs   (  233.66 ms per token)
llama_print_timings:       total time = 15325.80 ms
Output generated in 15.90 seconds (1.01 tokens/s, 16 tokens, context 1271, seed 2146990787)

eriksonssilva commented 1 year ago

I'm also having a similar issue... However depending on the model, it simply starts talking "gibberish" with itself.} I've asked it to search 10 mind boggling movies (just as a test), and it simply started "planning" on how "they" should watch a movie... It was a fun script though!

neelayan7 commented 1 year ago

@sirajperson can you look into this?

sirajperson commented 1 year ago

Sure @neelayan7. What just to be on the same page. What model are we using here. I have been working on experimenting with 30B-Lazarus. They have recently updated it to include the superHOT lora.

sirajperson commented 1 year ago

Since the new SuperHOT has expanded the context window for Llama to 8-16k, I'm working on getting useful JSON from 30B-Lazurus. I'll be working on the repetitive prompts and the thinking tool issue today.

On my branch I've updated the Docker files to use the latest version of the TGWUI docker specific repo. On my testing rig all of the containers are loading great. I don't have the best computer, so I can't test GPTQ based inferences, so I've been sticking to models that I can inference directly in HF format, or models that I can use with llama-cpp. I would really like to use MPT-30b-instruct, because when I inference that model using GGML on the command line it produces great code. Right now I'm downloading and testing 30B-Lazurs. They recently merged the SuperHot LoRA in the repo, so I'm downloading and working with that one for now. I would like to think that the json problem is associated with the model being used. For example, If I used Llama_3b, the application would loop and fail because Llama 3b doesn't have the ability yet to produce the kind of responses that the agents need in order to process functions controlled by JSON. If 30B-Lazurus comes close though, it may be worth either using an additional LoRA that either already allows the model to produce usable JSON, or to simply train a LoRA to do the job for SuperAGI.

The big hurdle of dealing with the limited context window though is definitely solved thanks to SuperHOT. At this time I'm downloading the weights from hugging face for 30B-Lazurs. I selected this model because it has the highest perplexity score on the HF leader board. I reasoned that by using a model that was not specifically fine tuned for coding would allow the agent to have better access to solving a broad range of problems. However, I'm not sure yet if that creates a trade off for not being able to produce answers that are in the correct json format.

It may require not using the default prompt configurations that are in place when creating a new agent.

There have have been discussions on other issues regarding this looping issue. Namely: #243 , #411 , #386 It would be great if everyone that is testing or working on the full execution of various local models defer there comments to this issue so that everyone can keep track of the discussion easily.

sirajperson commented 1 year ago

So I've been burning the midnight oil the last several days trying to match the best opensource models to the task agent. So far the best one that has started successfully started generating instructions and following a reasoning chain of thought has been: ausboss/llama-30b-supercot

In order to run it keep it coherent it cannot be quantized that much. My testing and development machine has 64gigs of system memory and 48 gigs of vram. I'm not able to use GPTQ models because my accelerator cards do not support half precision floats. You may have more luck with GPTQ models because you can easily attach LoRAs to them. In any case, using open source models is not as versatile as using ChatGPT. The instruct models have a specific data set that they were trained on. A large list of prompt styles can be found in the TGWUI characters folder: Text Generation Web UI Characters What that boils down to is that the task agent has to speak in the style of the selected language model. In the case of using llama-30b-supercot I had to create a modification of the prompt templates in SuperAGI/superagi/agent/prompts/ So far these seem to have the best results, not to say that they could probably be much better:

superagt.txt:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instructions:
As an AI assistant to solve complex problems, your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications. If you have completed all your tasks or reached the end state, make sure to use the "finish" tool.
Select and use tools from the list provided in the 'Input' section as needed to effectively solve each step of the problem.

{instructions}

PERFORMANCE EVALUATION:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Use instruction to decide the flow of execution and decide the next steps for achieving the task.
3. Constructively self-criticize your big-picture behavior constantly.
4. Reflect on past decisions and strategies to refine your approach.
5. Every tool has a cost, so be smart and efficient.
6. Aim to complete tasks in the least number of steps.

I should only respond in JSON format as described below.
Response Format:
{response_format}

Ensure the response can be parsed by Python json.loads.

### Input:
GOALS:
{goals}

CONSTRAINTS:
{constraints}

TOOLS:
{tools}

### Response:

analyse_task.txt:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instructions:
High-Level Goal:
{goals}

Additional Instructions:
{task_instructions}

Your Current Task:
{current_task}

Based on this information, your job is to understand the current task, pick out key parts, and think smartly and quickly. Explain why you are taking each action, create a plan, and mention any concerns you might have. Ensure the next action tool is picked from the provided tool list.

Your answer must be in JSON format that can be read by JSON.parse(), and nothing else.

RESPONSE FORMAT:
{
    "thoughts": {
        "reasoning": "reasoning"
    },
    "tool": {"name": "tool name", "args": {"arg name": "string value"}}
}

### Input:
GOALS:
{goals}

CURRENT TASK:
{current_task}

TOOLS:
{tools}

TASK HISTORY:
{task_history}

ADDITIONAL INSTRUCTIONS:
{task_instructions}

### Response:

create_task.txt:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instructions:
You are an AI assistant tasked with creating tasks.

High-Level Goal:
{goals}

Additional Instructions:
{task_instructions}

You have the following incomplete tasks: `{pending_tasks}`. You have the following completed tasks: `{completed_tasks}`.

Based on this information, create a single task to be completed by your AI system ONLY IF REQUIRED to get closer to or fully achieve the high-level goal. Do not create any task if it is already covered in incomplete or completed tasks. Ensure your new task does not deviate from the goal.

Your answer should be an array of strings, suitable for utilization with JSON.parse(). Return an empty array if no new task is required.

### Input:
GOALS:
{goals}

PENDING TASKS:
{pending_tasks}

COMPLETED TASKS:
{completed_tasks}

ADDITIONAL INSTRUCTIONS:
{task_instructions}

### Response:

initialize_task.txt:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instructions:
You are a task-generating AI. You are not a part of any system or device. Your role is to understand the goals presented to you, identify important components, go through the instructions provided by the user, and construct a thorough execution plan.

Construct a sequence of actions to achieve the goal, ensuring that the sequence does not exceed 3 steps.

Submit your response as a formatted array of strings, suitable for utilization with JSON.parse().

Example: ["{{TASK-1}}", "{{TASK-2}}"].

### Input:
GOALS:
{goals}

TASK INSTRUCTIONS:
{task_instructions}

### Response:

prioritize_task.txt:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instructions:
You are a task prioritization AI assistant.

High-Level Goal:
{goals}

Additional Instructions:
{task_instructions}

You have the following incomplete tasks: `{pending_tasks}`. You have the following completed tasks: `{completed_tasks}`.

Based on this information, evaluate the incomplete tasks and sort them in the order of execution. The first task in the output will be executed first and so on. Remove any tasks that are unnecessary, duplicate, already completed, or do not contribute to achieving the main goal.

Your answer should be an array of strings in JSON format that can be used with JSON.parse() and NOTHING ELSE.

### Input:
GOALS:
{goals}

PENDING TASKS:
{pending_tasks}

COMPLETED TASKS:
{completed_tasks}

### Response:

It is important to note that there is a newline after '### Response:', and that the file ends with the newline, not two or three. Again, with lower parameter models and the use of quantization the really don't have much room for changing the prompt styles. As for the configuration of TGWUI. It's important to be sure that the OPENAI_API_BASE: setting is pointing to wherever your running TGWUI. For debugging, I have found it much easier to just run TGWUI form my IDE. However, if one would like to use docker the docker image needs to be configured for the run environment of the container. If the target host machine is going to be using GGML models and executing the inference on the CPU, then setup is straight forward. If, however, one's looking to use their GPU to either run the model outright, or if one is offloading layers of the model to the GPU for faster GGML conferencing (which happens to be my case) then you'll need to install the requirements to Docker to access the host machines GPUs. A full tutorial is well documented on TGWUI's docker readme: TGWUI Docker ReadMe

Please be aware that getting local LLMs to execute agents is still a work in progress. The framework is there, while it is buggy, it is coming together. I have been working have tried working with some of the new SuperHOT models, like Vicuna-33B, but the model doesn't respond to the queries even at all. Some of the models that I have experimented with so far are:

guanaco-65b-merged: Required significant quantization in my testing environment which made it not suitable guanaco-30b-merged: Better results that the 65B model since I only had to quantize the model to Q8_0 in order to run it. Vicuna-13B-1-3-SuperHOT-8K-fp16: Even though I had high hopes for this model since it has been merged with the SuperHOT LoRA, it just didn't perform very well with chain of thought prompting. It seems that having a chain of though model is important. Wizard-Vicuna-13B-Uncensored: I tried this model before the availability of SuperHOT. It would respond in non JSON format, which was unusable for the agent controller. Stable-Vicuna-13B: Same situation as Wizard-Vicuna 30B-Lazarus: Performed quite well, but stopped producing JSON responses. Vicuna-EvolInstruct-13B This model was promising but seemed to be hallucinating in the answers to frequently to actually be useful for multi step problem solving. ausboss_llama-30b-supercot So far is the best performing model that I have been able to test in my environment.

I will continue to work on creating prompt's and researching models that allow SuperAGI to perform a wide variety of tasks. The next one that I'm downloading while writing this post is TheBloke_WizardLM-Uncensored-SuperCOT-StoryTelling-30B-SuperHOT-8K-fp16

I'm still having trouble with several issues. Debugging SuperAGI has been difficult for me. If anyone has been developing the agent and is reading this post I would love some insight into your debugging environment. I've been using PyCharmPro. Using SuperAGI from the command line isn't working and seems to be broken. Debugging the docker container isn't so straight forward. I'd love to be able to work on making more tools for SuperAGI after LocalLLMs becomes more stable.

So far, I have been able to get the LLM to do step by step reasoning. I have been able to get it to think that it has completed a task. I have not been able to get it to use tools. I'm not certain if this is an issue with the response that the LLM is providing or if there is something else going on with the agent: 4 Goals

Check to see if a file name love.txt exits
If love.txt exists delete it.
Create a file named love.txt
Save a 50 word poem about love to love.txt Determine which next tool to use, and respond using the format specified above: schedule-icon 16 hours ago 💡 thoughts = { "text": "I will first check if love.txt exists.", "reasoning": "To ensure I am not deleting a file that does not exist.", "plan": "- short bulleted\n- list that conveys\n- long-term plan", "criticism": "constructive self-criticism", "speak": "I will check if love.txt exists." } tool = { "name": "List File", "description": "lists files in a directory recursively", "args": {} } schedule-icon 16 hours ago 💡 thoughts = { "text": "I will first check if love.txt exists.", "reasoning": "To ensure I am not deleting a file that does not exist.", "plan": "- short bulleted\n- list that conveys\n- long-term plan", "criticism": "constructive self-criticism", "speak": "I will check if love.txt exists." } tool = { "name": "Read File", "description": "reads the file content in a specified location", "args": {"file_name": {"title": "File Name", "description": "Path of the file to read", "type": "string"}} } schedule-icon 16 hours ago 💡 thoughts = { "text": "I will first check if love.txt exists.", "reasoning": "To ensure I am not deleting a file that does not exist.", "plan": "- short bulleted\n- list that conveys\n- long-term plan", "criticism": "constructive self-criticism", "speak": "I will check if love.txt exists." } tool = { "name": "Delete File", "description": "delete a file, args json schema: {"file_name": {"title": "File Name", "description": "Name of the file to delete", "type": "string"}}", "args": {} }

As you can see, the chain of though is correct, but there is no response from the agent from using tools like list files write files. Several times the agent had a similar arraignment of tasks and thought that it had completed all the tasks and responded "finished" but the task agent kept querying it and didn't exit. Again, anyone that could help me stop at break points in the container so that I can debug SuperAGI would have my deepest gratitude.

The value in this research is that in my tests it seems that running an agent consumes about 7k tokens per minute... and that's on my slow machine. In actual usage with GPT, I could easily see that being more like 10 to 12. In a nutshell to run an agent via the token expense model would cost about 150 to 200 per day per agent. Running an LLM on a good computer would simply cost the price of electricity. Somewhere in the neighborhood of 60 to 70 dollars per month. Also, there is the issue of data privacy. An agent that runs in doctors or lawyers office may not be able to communicate important information to chatGPT as it may be illegal. Aside from monetary concerns or business practice, having a contained agent execution environment is extremely desirable, in many circumstances. Not to mention that in the case of making automated machines, having a localized problem solving solution would be fantastic for scientific research. They would be invaluable in space exploration of distant bodies, where the time it takes to communicate with earth would make decision making just simply not feasible. I could go on and on about how awesome these tools are, but I'm sure you already know... especially if you read this post to here! :'-D

SlistInc commented 11 months ago

I wanted to ask whether you have an update on which models you would currently recommend?

Given the issues you encounter I would actually highly recommend SuperAGI to base the prompt using LMQL or guidance. These tools would dramatically increase the quality of the outputs and ensure proper JSON structures. I personally use LMQL and I really love such features that for instance constrain outputs to a certain set of options (e.g. for tool selection) or generally structured outputs.

sirus20x6 commented 11 months ago

would development here be faster if you had access to a machine with 512GB of ram? maybe I could host something

AFKler commented 4 months ago

oobabooga rewrites the prompt using jinja2 in yaml files. since it involves roles like system, user or assistant to create a prompt matching the models template, a failure is far more unlikely. https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates

llama.cpp server has a similar feature https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

--chat-template JINJA_TEMPLATE: Set custom jinja chat template. This parameter accepts a string, not a file name (default: template taken from model's metadata). We only support some pre-defined templates

i dont know how prompts are handled when using a local llm with superagi

TransformerOptimus / SuperAGI

LLM output stuck in a loop #542