[Feature] streamlit run 推理出现错误

lckj2009 commented 1 month ago

Motivation

streamlit run /root/InternVL/streamlit_demo/app.py --server.port $WEB_SERVER_PORT -- --controller_url $CONTROLLER_URL --sd_worker_url $SD_WORKER_URL 执行的时候，出现了

You can now view your Streamlit app in your browser.

Local URL: http://localhost:10003 Network URL: http://内网IP:10003 External URL: http://外网IP:10003

表明上面看好像可以在我个人电脑端的浏览器里执行http://外网IP:10003，但是用浏览器打开这个地址时出现了连接失败的错误，不知道什么原因，怎么解决。

错误如下图：

Related resources

telnet 外网IP 10003 是可以正常连接的

Additional context

connet

lckj2009 commented 1 month ago

甚至我还没来得及打开浏览器，命令行就报错了，报错如下：

You can now view your Streamlit app in your browser.

Local URL: http://localhost:10003 Network URL: http://内网IP:10003 External URL: http://外网IP:10003

args: Namespace(controller_url='http://0.0.0.0:40000', sd_worker_url='http://0.0.0.0:39999', max_image_limit=4) 2024-08-09 10:09:22.198 Uncaught app exception Traceback (most recent call last): File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn sock = connection.create_connection( File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen response = self._make_request( File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connectionpool.py", line 495, in _make_request conn.request( File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connection.py", line 398, in request self.endheaders() File "/root/anaconda3/envs/InternVL/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/root/anaconda3/envs/InternVL/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/root/anaconda3/envs/InternVL/lib/python3.10/http/client.py", line 976, in send self.connect() File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connection.py", line 236, in connect self.sock = self._new_conn() File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f879ca63fa0>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen retries = retries.increment( File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=40000): Max retries exceeded with url: /refresh_all_workers (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f879ca63fa0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 85, in exec_func_with_error_handling result = func() File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 576, in code_to_exec exec(code, module.dict) File "/root/InternVL/streamlit_demo/app.py", line 270, in model_list = get_model_list() File "/root/InternVL/streamlit_demo/app.py", line 48, in get_model_list ret = requests.post(controller_url + '/refresh_all_workers') File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, kwargs) File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/requests/adapters.py", line 700, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=40000): Max retries exceeded with url: /refresh_all_workers (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f879ca63fa0>: Failed to establish a new connection: [Errno 111] Connection refused'))

lckj2009 commented 1 month ago

我是执行到 Step 2: Start the Streamlit Web Server 就出现了这个错误，麻烦帮忙看一下问题在哪里

czczup commented 1 month ago

这个报错是因为链接不到controller，建议您检查一下

lckj2009 commented 1 month ago

这个报错是因为链接不到controller，建议您检查一下嗯，现在好了，官网上面应该说明一下。真正的启动顺序应该是先WORK，再控制器，最后才是WEB服务。

但是，我WORK启动的时候报错了，安装models--stabilityai--stable-diffusion-3-medium-diffusers过程中报错的，好像是fast_tokenizer有问题，可能是版本匹配的问题吧，但是不知道怎么弄，麻烦请教一下。如下图：

work

lckj2009 commented 1 month ago

关键就是这里： File "/root/anaconda3/envs/InternVL/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 960 column 3

报错了

pip list的一部分： tqdm 4.66.4 transformers 4.37.2 transformers-stream-generator 0.0.5 triton 2.2.0 tritonclient 2.48.0 typeguard 2.13.3 typer 0.12.3 typing_extensions 4.12.2 tzdata 2024.1 tzlocal 5.2 uc-micro-py 1.0.3 urllib3 2.2.2 uvicorn 0.30.4 uvloop 0.19.0 validators 0.33.0 watchdog 4.0.1 watchfiles 0.22.0 wavedrom 2.0.3.post3 wcwidth 0.2.13 webdataset 0.2.86 websockets 12.0 Werkzeug 3.0.3 wheel 0.43.0 wrapt 1.14.1 xxhash 3.4.1 yacs 0.1.8 yapf 0.40.1 yarl 1.9.4 zipp 3.19.2

上面还有： tiktoken 0.7.0 timm 0.9.12 tokenizers 0.15.1 是不是版本不对？

czczup commented 1 month ago

可以先跳过SD worker，那个是可选的

lckj2009 commented 4 weeks ago

可以先跳过SD worker，那个是可选的

请问，跳过SD worker的话，InternVL2 Workers 是必须要弄的吧，InternVL2 Workers能否支持Mini-InternVL-Chat-4B-V1-5模型。我用InternVL2 Workers运行 Mini-InternVL-Chat-4B-V1-5模型报错。

xlxxcc commented 4 weeks ago

这个错误通常是由于 Hugging Face 的 transformers 库中的 from_pretrained 方法在加载预训练模型或标记器时遇到的问题引起的。具体来说，问题可能与加载预训练模型的配置文件或权重文件不匹配有关。

以下是可能的解决方案：

1. 检查模型文件的完整性：

确保所有模型文件都已正确下载，尤其是 model_index.json、pytorch_model.bin 和其他相关文件。可以使用 SHA256 检查文件的完整性。

2. 更新相关库：

由于 transformers 和 diffusers 这两个库在不断更新，可能你当前使用的版本存在一些兼容性问题。尝试更新 transformers 和 diffusers 到最新版本：

pip install --upgrade transformers diffusers
pip install protobuf

3. 使用缓存模型：

如果你手动下载了模型，可以将这些文件放入 Hugging Face 的缓存目录中，然后通过 from_pretrained 方法加载本地缓存的模型。例如，将模型文件放在 ~/.cache/huggingface/transformers/models--stabilityai--stable-diffusion-3-medium-diffusers 目录下。

4. 验证模型版本：

确认你使用的 transformers 版本和 diffusers 版本是互相兼容的，并且与模型的预期版本匹配。如果以上方法仍然无法解决问题，请确保模型文件没有损坏，或者尝试重新下载模型，并尝试使用不同的环境来运行代码。

lckj2009 commented 4 weeks ago

这个错误通常是由于 Hugging Face 的 transformers 库中的 from_pretrained 方法在加载预训练模型或标记器时遇到的问题引起的。具体来说，问题可能与加载预训练模型的配置文件或权重文件不匹配有关。

以下是可能的解决方案：

1. 检查模型文件的完整性：

确保所有模型文件都已正确下载，尤其是 model_index.json、pytorch_model.bin 和其他相关文件。可以使用 SHA256 检查文件的完整性。

2. 更新相关库：

由于 transformers 和 diffusers 这两个库在不断更新，可能你当前使用的版本存在一些兼容性问题。尝试更新 transformers 和 diffusers 到最新版本：
pip install --upgrade transformers diffusers
pip install protobuf
3. 使用缓存模型：

如果你手动下载了模型，可以将这些文件放入 Hugging Face 的缓存目录中，然后通过 from_pretrained 方法加载本地缓存的模型。例如，将模型文件放在 ~/.cache/huggingface/transformers/models--stabilityai--stable-diffusion-3-medium-diffusers 目录下。

4. 验证模型版本：

确认你使用的 transformers 版本和 diffusers 版本是互相兼容的，并且与模型的预期版本匹配。如果以上方法仍然无法解决问题，请确保模型文件没有损坏，或者尝试重新下载模型，并尝试使用不同的环境来运行代码。

好的，谢谢，我尝试一下

OpenGVLab / InternVL