INTERACT_FAILED Error: Session does not exist

glad4enkonm commented 8 months ago

Can not start a KG task with vicuna-7b Getting NTERACT_FAILED Error: Session does not exist What can be the reason?

To Reproduce Steps to reproduce the behavior:

configs/start_task.yaml


definition:
import: tasks/task_assembly.yaml

start: kg-std: 1


2. configs/assignments/default.yaml

import: definition.yaml

concurrency: task: kg-std: 1 agent: vicuna-7b: 1

assignments: # List[Assignment] | Assignment

agent: # "task": List[str] | str , "agent": List[str] | str
- vicuna-7b task:
- kg-std

output: "outputs/{TIMESTAMP}"

3. Running Vicuna 7b on cpu 8bit, starting the 
4. python -m src.start_task -a
5. python -m src.assigner
6. outputs/2023-11-07-22-36-41/vicuna-7b/kg-std/error.jsonl

{"index": 149, "error": "INTERACT_FAILED", "info": "{\"detail\":\"Error: Session does not exist\"}", "output": {"index": null, "status": "running", "result": null, "history": null}, "time": {"timestamp": 1699397194471, "str": "2023-11-07 22:46:34"}} {"index": 148, "error": "INTERACT_FAILED", "info": "{\"detail\":\"Error: Session does not exist\"}", "output": {"index": null, "status": "running", "result": null, "history": null}, "time": {"timestamp": 1699398002162, "str": "2023-11-07 23:00:02"}} {"index": 147, "error": "INTERACT_FAILED", "info": "{\"detail\":\"Error: Session does not exist\"}", "output": {"index": null, "status": "running", "result": null, "history": null}, "time": {"timestamp": 1699398438856, "str": "2023-11-07 23:07:18"}}


outputs/2023-11-07-22-36-41/config.yaml

assignments:

agent: vicuna-7b task: kg-std concurrency: agent: vicuna-7b: 1 task: kg-std: 1 definition: agent: vicuna-7b: module: src.client.agents.FastChatAgent parameters: controller_address: http://localhost:21001 max_new_tokens: 512 model_name: vicuna-7b-v1.5 name: FastChat temperature: 0 task: kg-std: module: src.client.TaskClient parameters: controller_address: http://localhost:5000/api data_file: data/knowledgegraph/std.json name: KnowledgeGraph-std round: 15 sparql_url: http://164.107.116.56:3093/sparql output: outputs/2023-11-07-22-36-41

Screenshots or Terminal Copy&Paste

zhc7 commented 8 months ago

Hi, @glad4enkonm. A possible reason is that the interaction session is killed due to time limit. I noticed that you are inferencing on CPU. If the LLM doesn't respond within four miniutes, the session might be killed. You may change the number https://github.com/THUDM/AgentBench/blob/adc728e073c7ba2934c5fbf05ca1eaa10cc2b21c/src/server/task_controller.py#L180 here to loosen this limit.

glad4enkonm commented 8 months ago

Hi, @zhc7, thanks for your reply. The error message is gone, although it stuck on 0/150. Is this the correct way to start the system?

source /home/myuser/anaconda3/bin/activate agent-bench
cd ~/AgentBench
python3 -m fastchat.serve.controller &
python3 -m fastchat.serve.model_worker --model-name vicuna-7b-v1.5 --device cpu --load-8bit &
python -m src.start_task -a &
sleep 60 && python -m src.assigner

glad4enkonm commented 8 months ago

Answering my own question, yes, it was the right way to start. Although it is better to start in separate console windows, comparing to background processes. The problem was that for a kg task and the selected llm most cases finish with a status "task limit reached" and on 24 cpu computer it takes about 20 min to run one test. Running the same task on 1 instance with gpu gives about 20-40s for one iteration.

THUDM / AgentBench

INTERACT_FAILED Error: Session does not exist #70