OpenBMB / AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
Apache License 2.0
4.08k stars 390 forks source link

Local model error #100

Closed lzw-lzw closed 10 months ago

lzw-lzw commented 10 months ago

Hello, thank you for the excellent framework. When I tried to run the local model following the tutorial, I encountered the following problem:ValueError: llama-2-7b-chat-hf is not registered. Please register with the .register("llama-2-7b-chat-hf") method provided in LLMRegistry registry. What could be the reason for this? Thanks.

chenweize1998 commented 10 months ago

Are you using the latest code? Did you set llm_type: local to all the agents in your config?

lzw-lzw commented 10 months ago

I am using the latest code, and after setting llm_type to local, a new error appears: KeyError: 'Could not automatically map llama-2-7b-chat-hf to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

chenweize1998 commented 10 months ago

I think the problem should have been fixed in a previous commit. Are you using the latest code on github repo? And did you install AgentVerse with pip install -e .?

lzw-lzw commented 10 months ago

Thanks for your patient reply. I am using the the latest code on github repo, and I also install AgentVerse with pip install -e . .After that, I change the MODEL_PATH and MODEL_NAME to the path of my llama-2-7b-chat-hf and "llama-2-7b-chat-hf", and then I run the run_local_model_server.sh. After that, I created a directory under brainstorming, which contains a config.yaml, with llm_type:local and model: llama-2-7b-chat-hf, then I run "python3 agentverse_command/main_tasksolving_cli.py --task tasksolving/brainstorming/llama-2-7b-chat-hf", this error occurred.

chenweize1998 commented 10 months ago

Just made some updates to the code. Please check if it's working correctly now. At the moment, I don't have access to a machine with a GPU, so I'm unable to fully run the process with local LLM. If the issue persists, I'll try to find a GPU machine for further debugging.

lzw-lzw commented 10 months ago

I'm sorry it still doesn't work properly, waiting for your try. Thanks!

xymou commented 10 months ago

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated.

1699844641986
chenweize1998 commented 10 months ago

Pull the latest code and try again. It works fine on my GPU machine now. After launching the FastChat service, check if it's running correctly by executing this command curl http://127.0.0.1:5000/v1/models-2-7b-chat-hf. It should returns something like

{"object":"list","data":[{"id":"llama-2-7b-chat-hf","object":"model","created":1699856748,"owned_by":"fastchat","root":"llama-2-7b-chat-hf","parent":null,"permission":[{"id":"modelperm-7bcKCjaRGuVKoeajAXkSgP","object":"model_permission","created":1699856748,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":true,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

After confirming the service is running, run the benchmark script with the following command: python agentverse_command/benchmark.py --task tasksolving/commongen/llama-2-7b-chat-hf --dataset_path data/commongen/commongen_hard.jsonl

This should execute successfully. However, please note that while the script should run, we cannot guarantee its performance as open-sourced LLMs generally lack behind OpenAI's GPTs.

chenweize1998 commented 10 months ago

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated. 1699844641986

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:

Action: [specific action]
Action Input: [related input]

OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.

xymou commented 10 months ago

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated. 1699844641986

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:

Action: [specific action]
Action Input: [related input]

OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.

Thank you for your reply! I've noticed that the open-source LLMs do not follow the instructions to generate reponses in required structure. Do you have any suggestions to solve this? e.g., giving the models an in-context example? But I guess the input length may be a restriction.😰

chenweize1998 commented 10 months ago

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated. 1699844641986

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:

Action: [specific action]
Action Input: [related input]

OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.

Thank you for your reply! I've noticed that the open-source LLMs do not follow the instructions to generate reponses in required structure. Do you have any suggestions to solve this? e.g., giving the models an in-context example? But I guess the input length may be a restriction.😰

A workaround may be using the constrained generation, e.g., outlines. But we don't support it yet. You may need to investigate and make some code edition.

lzw-lzw commented 10 months ago

It's working fine now, thanks for your patient reply.