Open TheWhiteWord opened 1 year ago
+1 to this question
It makes no sense to shovel money into some closed source while we have a powerful GPU that can run 13b Llama with no problem with some of the other open source projects.
I'd also be very eager to use local models with ChatDev, Llama based models show great promise
Local model use and perhaps as a more advanced feature, assign different models to different agents in the company - so could use a local python-optimized model for an engineer, and a llama2 model for the CEO, etc.
@j-loquat I love that idea. That is a thing i was considering more and more. Ai becoming more and more like greek gods, each with its characther and function that complete each other.. it was the original vision of Altman too, kind of, but they lost their way
No need to have 1 "god AGI" (which can not be ran locally as it demands crazy hardware) if we can have 20 agents with 20 different local narrow AI models that can be loaded one after another.
Oh god, sorry Devs but this conversation is too interesting. You may need to turn notifications off XD
I was trained as an artist, and the first thing to know is that limitations are the generator of creativity. A big Ai with all the knowledge of the world may just become the most boring thing to touch the planet. And this may be controversial, but I think that bad qualities are needed too...everything has its meaning and use in order to create balance. Just my opinion
This has been referenced in #33
gpt-3.5-turbo-16k.yaml
and set the model name to gpt-3.5-turbo-16k-0613
. OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
The command above did not work in Anaconda Prompt, but this version did:
(chatdev_conda_env) C:\chatdev>set OPENAI_API_BASE=http://127.0.0.1:5001/v1
(chatdev_conda_env) C:\chatdev>set OPENAI_API_KEY=123456
(chatdev_conda_env) C:\chatdev>python run.py --task "Hello world in python" --name "HelloWorld"
**[Preprocessing]**
**ChatDev Starts** (20230913191808)
**Timestamp**: 20230913191808
**config_path**: C:\chatdev\CompanyConfig\Default\ChatChainConfig.json
**config_phase_path**: C:\chatdev\CompanyConfig\Default\PhaseConfig.json
**config_role_path**: C:\chatdev\CompanyConfig\Default\RoleConfig.json
**task_prompt**: Hello world in python
**project_name**: HelloWorld
**Log File**: C:\chatdev\WareHouse\HelloWorld_DefaultOrganization_20230913191808.log
**ChatDevConfig**:
ChatEnvConfig.clear_structure: True
ChatEnvConfig.brainstorming: False
**ChatGPTConfig**:
ChatGPTConfig(temperature=0.2, top_p=1.0, n=1, stream=False, stop=None, max_tokens=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias={}, user='')
I am having a problem using it with local api:
It looks like all that the API returns is 1 token:
Text-generation-webui side:
llm_load_print_meta: model size = 13.02 B
llm_load_print_meta: general.name = openassistant_llama2-13b-orca-8k-3319
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 128.35 MB (+ 1600.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 11656 MB
...................................................................................................
llama_new_context_with_model: kv self size = 1600.00 MB
llama_new_context_with_model: compute buffer total size = 191.47 MB
llama_new_context_with_model: VRAM scratch buffer: 190.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-09-13 19:11:27 INFO:Loaded the model in 7.52 seconds.
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 498 tokens and max_tokens is 15937.
llama_print_timings: load time = 955.37 ms
llama_print_timings: sample time = 0.23 ms / 1 runs ( 0.23 ms per token, 4424.78 tokens per second)
llama_print_timings: prompt eval time = 955.31 ms / 498 tokens ( 1.92 ms per token, 521.30 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 957.88 ms
Output generated in 1.41 seconds (0.00 tokens/s, 0 tokens, context 498, seed 1828391196)
127.0.0.1 - - [13/Sep/2023 19:11:42] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 551 tokens and max_tokens is 15885.
Llama.generate: prefix-match hit
llama_print_timings: load time = 955.37 ms
llama_print_timings: sample time = 0.16 ms / 1 runs ( 0.16 ms per token, 6410.26 tokens per second)
llama_print_timings: prompt eval time = 835.88 ms / 489 tokens ( 1.71 ms per token, 585.01 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 836.73 ms
Output generated in 1.24 seconds (0.00 tokens/s, 0 tokens, context 551, seed 192786861)
127.0.0.1 - - [13/Sep/2023 19:11:46] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.
Llama.generate: prefix-match hit
llama_print_timings: load time = 955.37 ms
llama_print_timings: sample time = 0.13 ms / 1 runs ( 0.13 ms per token, 7633.59 tokens per second)
llama_print_timings: prompt eval time = 884.39 ms / 459 tokens ( 1.93 ms per token, 519.00 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 885.63 ms
Output generated in 1.26 seconds (0.00 tokens/s, 0 tokens, context 521, seed 1288396660)
127.0.0.1 - - [13/Sep/2023 19:11:53] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 574 tokens and max_tokens is 15854.
Llama.generate: prefix-match hit
ChatDev side
Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 0**
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 521
completion_tokens: 1
total_tokens: 522
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 574
completion_tokens: 1
total_tokens: 575
Chief Product Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Executive Officer. Now, we are both working at ChatDev and we share a common interest in collaborating to successfully complete a task assigned by a new customer.
Your main responsibilities include being an active decision-maker on users' demands and other key policy issues, leader, manager, and executor. Your decision-making role involves high-level decisions about policy and strategy; and your communicator role can involve speaking to the organization's management and employees.
Here is a new customer's task: Hello world in python.
To complete the task, I will give you one or more instructions, and you must help me to write a specific solution that appropriately solves the requested instruction based on your expertise and my needs.]
Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 544
completion_tokens: 1
total_tokens: 545
Yeah, the command above was for macOS, no troubles with conda environment here.
@andraz, why don't you increase the context to 4K or 8K tokens? Based on your model name it support context up to 8K tokens.
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.
As for one token response I guess it's streaming feature, so you don't need to wait for a full response.
Hello there, I am trying to use llama-2-7B version as described above.
I created a new yaml file with name gpt-3.5-turbo-16k.yaml
and set the model name to gpt-3.5-turbo-16k-0613
. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.
I can successfully run it and receive answers to my questions as part of the returning object via curl
but also says that
"usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
When then I try to run chatdev on a simple task ie python run.py --task "Hello world in python" --name "HelloWorld"
, chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs
...
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0
**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0
After 3 retries it crashes with the following KeyError
.
Traceback (most recent call last):
File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper
return func(self, *args, **kwargs)
File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step
response["id"],
KeyError: 'id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module>
chat_chain.execute_chain()
File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
self.execute_step(phase_item)
File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
self.chat_env = self.phases[phase].execute(self.chat_env,
File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute
self.chatting(chat_env=chat_env,
File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper
return func(*args, **kwargs)
File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting
assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step
assistant_response = self.assistant_agent.step(user_msg_rst)
File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]
I have already exportedOPENAI_API_BASE
and OPENAI_API_KEY
to the localhost otherwise it crashed.
What can I do to successfully use the local LLM?
Thanks for any help and sorry if this is the wrong place to ask it!
@starkdmi
- Create new model config file named
gpt-3.5-turbo-16k.yaml
and set the model name togpt-3.5-turbo-16k-0613
.
What model are you using?
- Install LocalAI - OpenAI compatible server.
- Create new model config file named
gpt-3.5-turbo-16k.yaml
and set the model name togpt-3.5-turbo-16k-0613
.
Hello @starkdmi , can you share the file gpt-3.5-turbo-16k.yaml
?
@jacktang, it depends on the model but for example looks like - gpt-3.5-turbo-16k.txt (rename to .yaml
) for Vicuna 1.5.
@GitSimply, those are working with many of the GPT tools on my setup: WizardLM, WizardCoder, WizardCoderPy, Wizard-Vicuna, Vicuna, CodeLLaMa.
I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.
Anyway. The basics:
On step 4 do this instead:
set OPENAI_API_BASE=http://localhost:1234/v1
And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.
Hey @starkdmi, while using LocalAI
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
git checkout -b build
After doing this in LocalAI, I am directly executing this in ChatDev OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
and I am getting the following error:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call result = fn(*args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step response = self.model_backend.run(messages=openai_messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/model_backend.py", line 69, in run response = openai.ChatCompletion.create(args, kwargs, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/ChatDev/run.py", line 111, in
how can I fix this?
@sankalp-25, the problem is the local open-ai server which wrongly responds. Do you have a config file for your model in the models/
directory near the .bin
file?
It should look like that one so it simulates gtp-3.5 model instead of hosting your-model.
LocalAI on startup will list the models hosted and you should see the correct name (gpt-3.5/4).
Hey @starkdmi, I have renamed the .yaml file to gpt-3.5-turbo-16k.yaml and the model file to gpt-3.5-turbo-16k-0613, after which I am doing as follows, and if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. If I am wrong, please let me know what is the mistake.
Please check the below log
$docker compose -f gpt-3.5-turbo-16k.yaml up -d --pull always [+] Running 1/1 ✔ api Pulled 2.9s [+] Running 1/0 ✔ Container localai-api-1 Running
$ curl http://localhost:8000/v1/models {"object":"list","data":[{"id":"gpt-3.5-turbo-16k-0613","object":"model"}]}
after this I am trying to run the following in chatdev
$OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
The error I am getting was given in previous comment
Thank you
@sankalp-25, we could test the model is working using this Python code:
import openai # https://github.com/openai/openai-python#installation
openai.api_key = "sk-dummy"
openai.api_base = "http://127.0.0.1:8000/v1"
chat_completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k-0613",
messages=[{"role": "user", "content": "Calculate 20 minus 5."}]
)
completion = chat_completion.choices[0].message.content
print(completion) # The result of 20 minus 5 is 15.
@starkdmi, what is it when you say config file? if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. and I only have gpt-3.5-turbo-16k-0613 and gpt-3.5-turbo-16k-0613.tmpl in /models, when I run the I am code for checking of model, the following is the error
Traceback (most recent call last):
File "/root/FGPT/LocalAI/models/infer.py", line 6, in
@sankalp-25, wow, the docker-compose.yaml
is completely different thing. The docs are here.
The correct content of the file named gpt-3.5-turbo-16k.yaml
may look like:
name: gpt-3.5-turbo-16k # or gpt-3.5-turbo-16k-0613
parameters:
model: vicuna-13b-v1.5-16k.Q5_K_M.gguf
temperature: 0.2
top_k: 80
top_p: 0.7
max_tokens: 2048
f16: true
context_size: 16384
template:
chat: vicuna
f16: true
gpu_layers: 32
mmap: true
I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.
Anyway. The basics:
- Windows 10
- Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)
On step 4 do this instead:
set OPENAI_API_BASE=http://localhost:1234/v1
And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.
Correct. You also need one more step. set OPENAI_API_KEY="xyz"
Hello there, I am trying to use llama-2-7B version as described above.
I created a new yaml file with name
gpt-3.5-turbo-16k.yaml
and set the model name togpt-3.5-turbo-16k-0613
. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.I can successfully run it and receive answers to my questions as part of the returning object via
curl
but also says that"usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
When then I try to run chatdev on a simple task ie
python run.py --task "Hello world in python" --name "HelloWorld"
, chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs... Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint". **[OpenAI_Usage_Info Receive]** prompt_tokens: 0 completion_tokens: 0 total_tokens: 0 **[OpenAI_Usage_Info Receive]** prompt_tokens: 0 completion_tokens: 0 total_tokens: 0 **[OpenAI_Usage_Info Receive]** prompt_tokens: 0 completion_tokens: 0 total_tokens: 0
After 3 retries it crashes with the following
KeyError
.Traceback (most recent call last): File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, **kwargs) File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step response["id"], KeyError: 'id' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module> chat_chain.execute_chain() File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain self.execute_step(phase_item) File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute self.chatting(chat_env=chat_env, File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper return func(*args, **kwargs) File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1) File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step assistant_response = self.assistant_agent.step(user_msg_rst) File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]
I have already exported
OPENAI_API_BASE
andOPENAI_API_KEY
to the localhost otherwise it crashed.What can I do to successfully use the local LLM?
Thanks for any help and sorry if this is the wrong place to ask it!
I have the same problem, trying to figure out how to solve without success at the moment.
I'm using LocalAI with this yaml config:
backend: llama
context_size: 2000
f16: true
gpu_layers: 4
name: gpt-3.5-turbo-16k-0613
parameters:
model: luna-ai-llama2-uncensored.Q4_0.gguf
temperature: 0.2
top_k: 40
top_p: 0.65
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'
template:
chat: gpt-3.5-turbo-16k-0613-chat
completion: gpt-3.5-turbo-16k-0613-completion
Check issue https://github.com/go-skynet/LocalAI/issues/1103.
It should be working now. A related id field from response was missing.
llama2 has .pth file. Do we need to convert to bin?
Hey @starkdmi, while using LocalAI
git clone https://github.com/go-skynet/LocalAI cd LocalAI git checkout -b build cp your-model.bin models/ docker compose up -d --pull always curl http://localhost:8080/v1/models
After doing this in LocalAI, I am directly executing this in ChatDev OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
and I am getting the following error:
Traceback (most recent call last): File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call result = fn(*args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/utils.py", line 145, in wrapper return func(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step response = self.model_backend.run(messages=openai_messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/model_backend.py", line 69, in run response = openai.ChatCompletion.create(args, kwargs, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create return super().create(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create response, , api_key = requestor.request( ^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request resp, got_stream = self._interpret_response(result, stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response self._interpret_response_line( File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line raise self.handle_error_response( openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/root/ChatDev/run.py", line 111, in chat_chain.execute_chain() File "/root/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain self.execute_step(phase_item) File "/root/ChatDev/chatdev/chat_chain.py", line 130, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 292, in execute self.chatting(chat_env=chat_env, File "/root/ChatDev/chatdev/utils.py", line 77, in wrapper return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/chatdev/phase.py", line 131, in chatting assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ChatDev/camel/agents/role_playing.py", line 242, in step assistant_response = self.assistant_agent.step(user_msg_rst) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init*.py", line 289, in wrapped_f return self(f, args, kw) ^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 379, in call do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init**.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7f272c7ca710 state=finished raised APIError>]
how can I fix this?
@starkdmi could you guide me on what steps did you use to get this working? I am pretty newbie.
I use ollama for inference with litellm for the api. Works perfectly fine! Ollama is quick enough, only downside is the context length for most local models.
Hey there, I am running ChatDev locally with LM Studio and mistral-7b on a linux amd64 CPU here, and it runs perfectly fine until ChatDev tries to do some sort of confirmation that the HTTPConnection is still up. I tested the LM Studio server with some curl queries and it was still working.. Can someone help out please?
I followed the quickstart steps from 1 to 3 normally, and added the following env variables to my .bashrc
(and ran source .bashrc
):
export OPENAI_API_KEY="sk-dummy1234"
export OPENAI_API_BASE="http://localhost:1234/v1"
Here is the error I got, I've ran ChatDev a few times now and it always returns the same error, seemingly on the same spot:
EDIT:
The issue was that running in CPU, the timeout was too low for the openai API, configuring the request timeout to unlimited solved my problem:
seems to work with Horde decentralized LLM https://github.com/scenaristeur/openai2horde
then
OPENAI_API_BASE=http://127.0.0.1:5678/v1 OPENAI_API_KEY="dummy" python run.py --task "2048 game" --name "2048"
build me a 10*10 2048 game
@scenaristeur What are you running on? I've been meaning to test on M1 Pro.
@opencoca i personnally don't have a GPU but it runs on the horde https://stablehorde.net/ crowd-sourced https://stablehorde.net/api/v2/workers?type=text
Is it possible to use ollama
使用Langchain-Chatchat这个项目,调用本地2000端口,
# update model config
LLM_MODELS = ["gpt-3.5-turbo-16k-0613"]
MODEL_PATH['llm_model'].update({"gpt-3.5-turbo-16k-0613": MODEL_PATH['llm_model']['chatglm3-6b-32k']})
OPENAI_API_BASE=http://127.0.0.1:2000/v1 OPENAI_API_KEY="dummy" python run.py --task "2048 game" --name "2048"
I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.
OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"
However, it doesn't run through and terminates with an error that the max tokens are exceeded:
Traceback (most recent call last):
File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
chat_chain.execute_chain()
File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
self.execute_step(phase_item)
File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
self.chat_env = self.phases[phase].execute(self.chat_env,
File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
self.chatting(chat_env=chat_env,
File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
return func(*args, **kwargs)
File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
self.chatting(chat_env=chat_env,
File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
return func(*args, **kwargs)
File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
if isinstance(assistant_response.msg, ChatMessage):
File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}
For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?
I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.
OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"
However, it doesn't run through and terminates with an error that the max tokens are exceeded:
Traceback (most recent call last): File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module> chat_chain.execute_chain() File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain self.execute_step(phase_item) File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step self.chat_env = self.phases[phase].execute(self.chat_env, File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute self.chatting(chat_env=chat_env, File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper return func(*args, **kwargs) File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name, File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection self.chatting(chat_env=chat_env, File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper return func(*args, **kwargs) File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting if isinstance(assistant_response.msg, ChatMessage): File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info))) RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}
For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?
My first though is that this is a max token problem
Obviously it is. Is it because of the model? How to raise the max tokens?
Looks like there's an open PR to add this - https://github.com/OpenBMB/ChatDev/pull/53
Is it possible to use ollama
yes using litellm openai-proxy, like that:
litellm --api_base http://localhost:11434 --add_key OPENAI_API_KEY=dummy --drop_params --model ollama/orca2:7b
A proxy server openai-compatible api will run and redirect to ollama, then run AI-openai-compatible-app, like chatdev:
OPENAI_API_BASE=http://localhost:8000/v1 OPENAI_API_KEY=dummy python3 run.py --task "<task>" --name "<title>"
@xkaraman
tenacity.RetryError: RetryError
have you fix the error? I met same err when I use chatglm3-6b as llm server. And server got some red color logs at ""POST /send_message HTTP/1.1" 404 Not Found" . So I think the code got err because llm server did not respond to /send_message correctly. And the code will try again until max_time.
- Install LocalAI - OpenAI compatible server.
- Create new model config file named
gpt-3.5-turbo-16k.yaml
and set the model name togpt-3.5-turbo-16k-0613
.- Start LocalAI server locally and run:
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"
This should be added to the wiki or documented somewhere
can someone guide me how to run on full docker stack like 1 container for local models 1 container for ChatDev
To save the base and/or key in the conda environment use this before activate it (or unactive then re active again)
conda env config vars set OPENAI_API_BASE=http://localhost:1234/v1 --name ChatDev_conda_env
conda env config vars set OPENAI_API_KEY=any --name ChatDev_conda_env
使用Langchain-Chatchat这个项目,调用本地2000端口
我尝试了你的提议,端口上不出意外是2w,应该不是2k(可能是打错了),我用的也是chatglm3-6b-32k,知识库是BAAI/bge-large-zh,能跑,但奇怪的是响应很慢,不是没有响应,而是过好一会才响应,gpu的80g内存够的,最后花了93mins完成一款无法运行的“Snake game in pure html”
For me it wasn't OPENAI_API_BASE, but BASE_URL. After setting this, everything works fine with LiteLLM + Ollama
FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.
If anyone has ollama integrated with this then please let me know. thanks a lot. happy coding.
how to go about using other services that like Together.ai offers an OpenAI compatible API, how to set host ?
If the API is OpenAI compatible you can point at the API endpoint using --api_base as with local models.
Hey Devs, let me start by saying that this programme is great. Well done on your work, and thanks for sharing it.
My question is: is there any plan to allow for the integration of local models? Even a just section in the documentation would be great.
Have a good day theWW