OpenBMB / ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
https://arxiv.org/abs/2307.07924
Apache License 2.0
24.35k stars 3.05k forks source link

Local Models integration #27

Open TheWhiteWord opened 9 months ago

TheWhiteWord commented 9 months ago

Hey Devs, let me start by saying that this programme is great. Well done on your work, and thanks for sharing it.

My question is: is there any plan to allow for the integration of local models? Even a just section in the documentation would be great.

Have a good day theWW

andraz commented 9 months ago

+1 to this question

It makes no sense to shovel money into some closed source while we have a powerful GPU that can run 13b Llama with no problem with some of the other open source projects.

thedualspace commented 9 months ago

I'd also be very eager to use local models with ChatDev, Llama based models show great promise

j-loquat commented 9 months ago

Local model use and perhaps as a more advanced feature, assign different models to different agents in the company - so could use a local python-optimized model for an engineer, and a llama2 model for the CEO, etc.

TheWhiteWord commented 9 months ago

@j-loquat I love that idea. That is a thing i was considering more and more. Ai becoming more and more like greek gods, each with its characther and function that complete each other.. it was the original vision of Altman too, kind of, but they lost their way

andraz commented 9 months ago

No need to have 1 "god AGI" (which can not be ran locally as it demands crazy hardware) if we can have 20 agents with 20 different local narrow AI models that can be loaded one after another.

TheWhiteWord commented 9 months ago

Oh god, sorry Devs but this conversation is too interesting. You may need to turn notifications off XD

I was trained as an artist, and the first thing to know is that limitations are the generator of creativity. A big Ai with all the knowledge of the world may just become the most boring thing to touch the planet. And this may be controversial, but I think that bad qualities are needed too...everything has its meaning and use in order to create balance. Just my opinion

hemangjoshi37a commented 9 months ago

This has been referenced in #33

starkdmi commented 9 months ago
  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.
  3. Start LocalAI server locally and run:
    OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
andraz commented 9 months ago

The command above did not work in Anaconda Prompt, but this version did:

(chatdev_conda_env) C:\chatdev>set OPENAI_API_BASE=http://127.0.0.1:5001/v1

(chatdev_conda_env) C:\chatdev>set OPENAI_API_KEY=123456

(chatdev_conda_env) C:\chatdev>python run.py --task "Hello world in python" --name "HelloWorld"
**[Preprocessing]**

**ChatDev Starts** (20230913191808)

**Timestamp**: 20230913191808

**config_path**: C:\chatdev\CompanyConfig\Default\ChatChainConfig.json

**config_phase_path**: C:\chatdev\CompanyConfig\Default\PhaseConfig.json

**config_role_path**: C:\chatdev\CompanyConfig\Default\RoleConfig.json

**task_prompt**: Hello world in python

**project_name**: HelloWorld

**Log File**: C:\chatdev\WareHouse\HelloWorld_DefaultOrganization_20230913191808.log

**ChatDevConfig**:
 ChatEnvConfig.clear_structure: True
ChatEnvConfig.brainstorming: False

**ChatGPTConfig**:
 ChatGPTConfig(temperature=0.2, top_p=1.0, n=1, stream=False, stop=None, max_tokens=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias={}, user='')

I am having a problem using it with local api:

It looks like all that the API returns is 1 token:

Text-generation-webui side:

llm_load_print_meta: model size     = 13.02 B
llm_load_print_meta: general.name   = openassistant_llama2-13b-orca-8k-3319
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  128.35 MB (+ 1600.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 11656 MB
...................................................................................................
llama_new_context_with_model: kv self size  = 1600.00 MB
llama_new_context_with_model: compute buffer total size =  191.47 MB
llama_new_context_with_model: VRAM scratch buffer: 190.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-09-13 19:11:27 INFO:Loaded the model in 7.52 seconds.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 498 tokens and max_tokens is 15937.

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.23 ms /     1 runs   (    0.23 ms per token,  4424.78 tokens per second)
llama_print_timings: prompt eval time =   955.31 ms /   498 tokens (    1.92 ms per token,   521.30 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   957.88 ms
Output generated in 1.41 seconds (0.00 tokens/s, 0 tokens, context 498, seed 1828391196)
127.0.0.1 - - [13/Sep/2023 19:11:42] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 551 tokens and max_tokens is 15885.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.16 ms /     1 runs   (    0.16 ms per token,  6410.26 tokens per second)
llama_print_timings: prompt eval time =   835.88 ms /   489 tokens (    1.71 ms per token,   585.01 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   836.73 ms
Output generated in 1.24 seconds (0.00 tokens/s, 0 tokens, context 551, seed 192786861)
127.0.0.1 - - [13/Sep/2023 19:11:46] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.13 ms /     1 runs   (    0.13 ms per token,  7633.59 tokens per second)
llama_print_timings: prompt eval time =   884.39 ms /   459 tokens (    1.93 ms per token,   519.00 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   885.63 ms
Output generated in 1.26 seconds (0.00 tokens/s, 0 tokens, context 521, seed 1288396660)
127.0.0.1 - - [13/Sep/2023 19:11:53] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 574 tokens and max_tokens is 15854.
Llama.generate: prefix-match hit

ChatDev side


Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 0**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 521
completion_tokens: 1
total_tokens: 522

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 574
completion_tokens: 1
total_tokens: 575

Chief Product Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Executive Officer. Now, we are both working at ChatDev and we share a common interest in collaborating to successfully complete a task assigned by a new customer.
Your main responsibilities include being an active decision-maker on users' demands and other key policy issues, leader, manager, and executor. Your decision-making role involves high-level decisions about policy and strategy; and your communicator role can involve speaking to the organization's management and employees.
Here is a new customer's task: Hello world in python.
To complete the task, I will give you one or more instructions, and you must help me to write a specific solution that appropriately solves the requested instruction based on your expertise and my needs.]

Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 544
completion_tokens: 1
total_tokens: 545
starkdmi commented 9 months ago

Yeah, the command above was for macOS, no troubles with conda environment here.

@andraz, why don't you increase the context to 4K or 8K tokens? Based on your model name it support context up to 8K tokens.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.

As for one token response I guess it's streaming feature, so you don't need to wait for a full response.

xkaraman commented 9 months ago

Hello there, I am trying to use llama-2-7B version as described above.

I created a new yaml file with name gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.

I can successfully run it and receive answers to my questions as part of the returning object via curl but also says that "usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

When then I try to run chatdev on a simple task ie python run.py --task "Hello world in python" --name "HelloWorld", chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs

...
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

After 3 retries it crashes with the following KeyError.

Traceback (most recent call last):
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper
    return func(self, *args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step
    response["id"],
KeyError: 'id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module>
    chat_chain.execute_chain()
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
    self.execute_step(phase_item)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute
    self.chatting(chat_env=chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting
    assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step
    assistant_response = self.assistant_agent.step(user_msg_rst)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]

I have already exportedOPENAI_API_BASE and OPENAI_API_KEY to the localhost otherwise it crashed.

What can I do to successfully use the local LLM?

Thanks for any help and sorry if this is the wrong place to ask it!

GitSimply commented 9 months ago

@starkdmi

  1. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

What model are you using?

jacktang commented 9 months ago
  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

Hello @starkdmi , can you share the file gpt-3.5-turbo-16k.yaml?

starkdmi commented 9 months ago

@jacktang, it depends on the model but for example looks like - gpt-3.5-turbo-16k.txt (rename to .yaml) for Vicuna 1.5.

@GitSimply, those are working with many of the GPT tools on my setup: WizardLM, WizardCoder, WizardCoderPy, Wizard-Vicuna, Vicuna, CodeLLaMa.

Egalitaristen commented 9 months ago

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

sankalp-25 commented 9 months ago

Hey @starkdmi, while using LocalAI

git clone https://github.com/go-skynet/LocalAI cd LocalAI git checkout -b build cp your-model.bin models/ docker compose up -d --pull always curl http://localhost:8080/v1/models

After doing this in LocalAI, I am directly executing this in ChatDev OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

and I am getting the following error:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call result = fn(*args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step response = self.model_backend.run(messages=openai_messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/model_backend.py", line 69, in run response = openai.ChatCompletion.create(args, kwargs, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/ChatDev/run.py", line 111, in chat_chain.execute_chain() File "/root/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain self.execute_step(phase_item) File "/root/ChatDev/chatdev/chat_chain.py", line 130, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 292, in execute self.chatting(chat_env=chat_env, File "/root/ChatDev/chatdev/utils.py", line 77, in wrapper return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 131, in chatting assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/role_playing.py", line 242, in step assistant_response = self.assistant_agent.step(user_msg_rst) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 289, in wrapped_f return self(f, args, **kw) ^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 379, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f272c7ca710 state=finished raised APIError>]

how can I fix this?

starkdmi commented 9 months ago

@sankalp-25, the problem is the local open-ai server which wrongly responds. Do you have a config file for your model in the models/ directory near the .bin file?

It should look like that one so it simulates gtp-3.5 model instead of hosting your-model.

LocalAI on startup will list the models hosted and you should see the correct name (gpt-3.5/4).

sankalp-25 commented 9 months ago

Hey @starkdmi, I have renamed the .yaml file to gpt-3.5-turbo-16k.yaml and the model file to gpt-3.5-turbo-16k-0613, after which I am doing as follows, and if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. If I am wrong, please let me know what is the mistake.

Please check the below log

$docker compose -f gpt-3.5-turbo-16k.yaml up -d --pull always [+] Running 1/1 ✔ api Pulled 2.9s [+] Running 1/0 ✔ Container localai-api-1 Running

$ curl http://localhost:8000/v1/models {"object":"list","data":[{"id":"gpt-3.5-turbo-16k-0613","object":"model"}]}

after this I am trying to run the following in chatdev

$OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

The error I am getting was given in previous comment

Thank you

starkdmi commented 9 months ago

@sankalp-25, we could test the model is working using this Python code:

import openai # https://github.com/openai/openai-python#installation

openai.api_key = "sk-dummy"
openai.api_base = "http://127.0.0.1:8000/v1"

chat_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k-0613",
  messages=[{"role": "user", "content": "Calculate 20 minus 5."}]
)

completion = chat_completion.choices[0].message.content
print(completion) # The result of 20 minus 5 is 15. 
sankalp-25 commented 9 months ago

@starkdmi, what is it when you say config file? if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. and I only have gpt-3.5-turbo-16k-0613 and gpt-3.5-turbo-16k-0613.tmpl in /models, when I run the I am code for checking of model, the following is the error

Traceback (most recent call last): File "/root/FGPT/LocalAI/models/infer.py", line 6, in chat_completion = openai.ChatCompletion.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: rpc error: code = Unknown desc = unimplemented {"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = unimplemented', 'type': ''}} {'Date': 'Thu, 28 Sep 2023 11:08:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '91'}

starkdmi commented 9 months ago

@sankalp-25, wow, the docker-compose.yaml is completely different thing. The docs are here.

The correct content of the file named gpt-3.5-turbo-16k.yaml may look like:

name: gpt-3.5-turbo-16k # or gpt-3.5-turbo-16k-0613

parameters:
  model: vicuna-13b-v1.5-16k.Q5_K_M.gguf
  temperature: 0.2
  top_k: 80
  top_p: 0.7
  max_tokens: 2048
  f16: true

context_size: 16384

template:
  chat: vicuna

f16: true
gpu_layers: 32
mmap: true
asfandsaleem commented 9 months ago

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

  • Windows 10
  • Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

Correct. You also need one more step. set OPENAI_API_KEY="xyz"

danieleforte92 commented 8 months ago

Hello there, I am trying to use llama-2-7B version as described above.

I created a new yaml file with name gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.

I can successfully run it and receive answers to my questions as part of the returning object via curl but also says that "usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

When then I try to run chatdev on a simple task ie python run.py --task "Hello world in python" --name "HelloWorld", chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs

...
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

After 3 retries it crashes with the following KeyError.

Traceback (most recent call last):
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper
    return func(self, *args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step
    response["id"],
KeyError: 'id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module>
    chat_chain.execute_chain()
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
    self.execute_step(phase_item)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute
    self.chatting(chat_env=chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting
    assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step
    assistant_response = self.assistant_agent.step(user_msg_rst)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]

I have already exportedOPENAI_API_BASE and OPENAI_API_KEY to the localhost otherwise it crashed.

What can I do to successfully use the local LLM?

Thanks for any help and sorry if this is the wrong place to ask it!

I have the same problem, trying to figure out how to solve without success at the moment.

I'm using LocalAI with this yaml config:

backend: llama
context_size: 2000
f16: true
gpu_layers: 4
name: gpt-3.5-turbo-16k-0613
parameters:
  model: luna-ai-llama2-uncensored.Q4_0.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.65
roles:
  assistant: 'ASSISTANT:'
  system: 'SYSTEM:'
  user: 'USER:'
template:
  chat: gpt-3.5-turbo-16k-0613-chat
  completion: gpt-3.5-turbo-16k-0613-completion
xkaraman commented 8 months ago

Check issue https://github.com/go-skynet/LocalAI/issues/1103.

It should be working now. A related id field from response was missing.

innocuous76 commented 8 months ago

llama2 has .pth file. Do we need to convert to bin?

Hey @starkdmi, while using LocalAI

git clone https://github.com/go-skynet/LocalAI cd LocalAI git checkout -b build cp your-model.bin models/ docker compose up -d --pull always curl http://localhost:8080/v1/models

After doing this in LocalAI, I am directly executing this in ChatDev OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

and I am getting the following error:

Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call result = fn(*args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step response = self.model_backend.run(messages=openai_messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/model_backend.py", line 69, in run response = openai.ChatCompletion.create(args, kwargs, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/ChatDev/run.py", line 111, in chat_chain.execute_chain() File "/root/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain self.execute_step(phase_item) File "/root/ChatDev/chatdev/chat_chain.py", line 130, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 292, in execute self.chatting(chat_env=chat_env, File "/root/ChatDev/chatdev/utils.py", line 77, in wrapper return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 131, in chatting assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/role_playing.py", line 242, in step assistant_response = self.assistant_agent.step(user_msg_rst) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init*.py", line 289, in wrapped_f return self(f, args, kw) ^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 379, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init**.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f272c7ca710 state=finished raised APIError>]

how can I fix this?

starkdmi commented 8 months ago

@innocuous76, I do use GGUF format for faster GPU interference on Apple Silicon using llama.cpp backend which is built-in into LocalAI.

For example there is LLaMa 13B. I do suggest trying Mistral-7B instead.

There is the list of LocalAI compatible model formats.

innocuous76 commented 8 months ago

@starkdmi could you guide me on what steps did you use to get this working? I am pretty newbie.

starkdmi commented 8 months ago

@innocuous76, I do use LocalAI for interference, but there is a docs link to LiteLLM project which has instructions how to run with ChatDev and other LLM based tools locally.

platinaCoder commented 8 months ago

I use ollama for inference with litellm for the api. Works perfectly fine! Ollama is quick enough, only downside is the context length for most local models.

TheGobbo commented 8 months ago

Hey there, I am running ChatDev locally with LM Studio and mistral-7b on a linux amd64 CPU here, and it runs perfectly fine until ChatDev tries to do some sort of confirmation that the HTTPConnection is still up. I tested the LM Studio server with some curl queries and it was still working.. Can someone help out please?

I followed the quickstart steps from 1 to 3 normally, and added the following env variables to my .bashrc (and ran source .bashrc):

export OPENAI_API_KEY="sk-dummy1234"
export OPENAI_API_BASE="http://localhost:1234/v1"

Here is the error I got, I've ran ChatDev a few times now and it always returns the same error, seemingly on the same spot:

ChatDev terminal log
You will start with the "main" file, then go to the ones that are imported by that file, and so on. Please note that the code should be fully functional. Ensure to implement all functions. No placeholders (such as 'pass' in Python). ```log Traceback (most recent call last): File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 537, in _make_request response = conn.getresponse() File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connection.py", line 461, in getresponse httplib_response = super().getresponse() File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/http/client.py", line 1377, in getresponse response.begin() File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/http/client.py", line 320, in begin version, status, reason = self._read_status() File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/http/client.py", line 281, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/socket.py", line 704, in readinto return self._sock.recv_into(b) socket.timeout: timed out The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 845, in urlopen retries = retries.increment( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/util/retry.py", line 470, in increment raise reraise(type(error), error, _stacktrace) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/util/util.py", line 39, in reraise raise value File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 791, in urlopen response = self._make_request( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 539, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 371, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=600) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/openai/api_requestor.py", line 596, in request_raw result = _thread_context.session.request( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/requests/adapters.py", line 532, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=600) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/home/my_user/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, **kwargs) File "/home/my_user/ChatDev/camel/agents/chat_agent.py", line 191, in step response = self.model_backend.run(messages=openai_messages) File "/home/my_user/ChatDev/camel/model_backend.py", line 70, in run response = openai.ChatCompletion.create(*args, **kwargs, File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create response, _, api_key = requestor.request( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/openai/api_requestor.py", line 288, in request result = self.request_raw( File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/openai/api_requestor.py", line 607, in request_raw raise error.Timeout("Request timed out: {}".format(e)) from e openai.error.Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=600) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/my_user/ChatDev/run.py", line 115, in chat_chain.execute_chain() File "/home/my_user/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain self.execute_step(phase_item) File "/home/my_user/ChatDev/chatdev/chat_chain.py", line 130, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, File "/home/my_user/ChatDev/chatdev/phase.py", line 291, in execute self.chatting(chat_env=chat_env, File "/home/my_user/ChatDev/chatdev/utils.py", line 77, in wrapper return func(*args, **kwargs) File "/home/my_user/ChatDev/chatdev/phase.py", line 130, in chatting assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1) File "/home/my_user/ChatDev/camel/agents/role_playing.py", line 242, in step assistant_response = self.assistant_agent.step(user_msg_rst) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/my_user/miniconda3/envs/ChatDev_conda_env/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[] ```

EDIT:

The issue was that running in CPU, the timeout was too low for the openai API, configuring the request timeout to unlimited solved my problem:

https://community.openai.com/t/how-to-set-a-timeout-on-an-api-function-call-using-using-the-python-library/4213/4

Local ChatDev Fix
```diff diff --git a/WareHouse/FAIR_ENOUGH_ModelBest1024_20231026000126/project_evaluator.py b/WareHouse/FAIR_ENOUGH_ModelBest1024_20231026000126/project_evaluator.py index d115278..68716ae 100644 --- a/WareHouse/FAIR_ENOUGH_ModelBest1024_20231026000126/project_evaluator.py +++ b/WareHouse/FAIR_ENOUGH_ModelBest1024_20231026000126/project_evaluator.py @@ -47,7 +47,8 @@ class ProjectEvaluator: messages=[ {"role": "system", "content": self.prompt}, {"role": "user", "content": f"Project Name: {project_name}\nProject Description: {project_description}\n"} - ] + ], + request_timeout=0 ) print("response got", i) content = resp.choices[0]["message"]["content"] diff --git a/camel/model_backend.py b/camel/model_backend.py index d54eea4..909f8dc 100644 --- a/camel/model_backend.py +++ b/camel/model_backend.py @@ -69,6 +69,7 @@ class OpenAIModel(ModelBackend): self.model_config_dict['max_tokens'] = num_max_completion_tokens response = openai.ChatCompletion.create(*args, **kwargs, model=self.model_type.value, + request_timeout=0, **self.model_config_dict) cost = prompt_cost( self.model_type.value, ```
scenaristeur commented 7 months ago

seems to work with Horde decentralized LLM https://github.com/scenaristeur/openai2horde

then OPENAI_API_BASE=http://127.0.0.1:5678/v1 OPENAI_API_KEY="dummy" python run.py --task "2048 game" --name "2048"

build me a 10*10 2048 game image

opencoca commented 7 months ago

@scenaristeur What are you running on? I've been meaning to test on M1 Pro.

scenaristeur commented 7 months ago

@opencoca i personnally don't have a GPU but it runs on the horde https://stablehorde.net/ crowd-sourced https://stablehorde.net/api/v2/workers?type=text

YaswanthDasamandam commented 7 months ago

Is it possible to use ollama

favouriter commented 7 months ago

使用Langchain-Chatchat这个项目,调用本地2000端口,

# update model config
LLM_MODELS = ["gpt-3.5-turbo-16k-0613"]
MODEL_PATH['llm_model'].update({"gpt-3.5-turbo-16k-0613": MODEL_PATH['llm_model']['chatglm3-6b-32k']})
OPENAI_API_BASE=http://127.0.0.1:2000/v1 OPENAI_API_KEY="dummy" python run.py --task "2048 game" --name "2048"
travelhawk commented 7 months ago

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

jamiemoller commented 7 months ago

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

My first though is that this is a max token problem

travelhawk commented 7 months ago

Obviously it is. Is it because of the model? How to raise the max tokens?

sammcj commented 7 months ago

Looks like there's an open PR to add this - https://github.com/OpenBMB/ChatDev/pull/53

acbp commented 7 months ago

Is it possible to use ollama

yes using litellm openai-proxy, like that:

litellm --api_base http://localhost:11434 --add_key OPENAI_API_KEY=dummy --drop_params --model ollama/orca2:7b

A proxy server openai-compatible api will run and redirect to ollama, then run AI-openai-compatible-app, like chatdev:

OPENAI_API_BASE=http://localhost:8000/v1 OPENAI_API_KEY=dummy python3 run.py --task "<task>" --name "<title>"

docs litellm proxy

BackMountainDevil commented 7 months ago

@xkaraman

tenacity.RetryError: RetryError

have you fix the error? I met same err when I use chatglm3-6b as llm server. And server got some red color logs at ""POST /send_message HTTP/1.1" 404 Not Found" . So I think the code got err because llm server did not respond to /send_message correctly. And the code will try again until max_time.

davidxll commented 7 months ago
  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.
  3. Start LocalAI server locally and run:
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

This should be added to the wiki or documented somewhere

godshades commented 7 months ago

can someone guide me how to run on full docker stack like 1 container for local models 1 container for ChatDev

tecno14 commented 6 months ago

To save the base and/or key in the conda environment use this before activate it (or unactive then re active again)

conda env config vars set OPENAI_API_BASE=http://localhost:1234/v1 --name ChatDev_conda_env
conda env config vars set OPENAI_API_KEY=any --name ChatDev_conda_env
BackMountainDevil commented 6 months ago

使用Langchain-Chatchat这个项目,调用本地2000端口

我尝试了你的提议,端口上不出意外是2w,应该不是2k(可能是打错了),我用的也是chatglm3-6b-32k,知识库是BAAI/bge-large-zh,能跑,但奇怪的是响应很慢,不是没有响应,而是过好一会才响应,gpu的80g内存够的,最后花了93mins完成一款无法运行的“Snake game in pure html”

mroxso commented 6 months ago

For me it wasn't OPENAI_API_BASE, but BASE_URL. After setting this, everything works fine with LiteLLM + Ollama

evmond1 commented 4 months ago

FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.

image

hemangjoshi37a commented 4 months ago

If anyone has ollama integrated with this then please let me know. thanks a lot. happy coding.

akhil3417 commented 4 months ago

how to go about using other services that like Together.ai offers an OpenAI compatible API, how to set host ?

opencoca commented 4 months ago

If the API is OpenAI compatible you can point at the API endpoint using --api_base as with local models.