run worker error - Githubissues

chenzhu005774 commented 2 months ago

root@ubuntu-Z690:/mnt/workspace/.cache/modelscope/BAAI/Bunny-v1___0-3B# python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /mnt/workspace/.cache/modelscope/BAAI/

2024-04-26 11:49:04.450276: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-04-26 11:49:04.451693: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-04-26 11:49:04.469047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-04-26 11:49:04.469064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-04-26 11:49:04.469078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-04-26 11:49:04.472914: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-04-26 11:49:04.473033: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-04-26 11:49:04.905731: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Bunny/bunny/serve/model_worker.py", line 20, in from bunny.model.builder import load_pretrained_model File "/Bunny/bunny/model/init.py", line 1, in from .language_model.bunny_phi import BunnyPhiForCausalLM, BunnyPhiConfig File "/Bunny/bunny/model/language_model/bunny_phi.py", line 11, in from ..bunny_arch import BunnyMetaModel, BunnyMetaForCausalLM File "/Bunny/bunny/model/bunny_arch.py", line 6, in from .multimodal_projector.builder import build_vision_projector File "/Bunny/bunny/model/multimodal_projector/builder.py", line 5, in from timm.layers.norm_act import LayerNormAct2d ModuleNotFoundError: No module named 'timm.layers'

Isaachhh commented 2 months ago

Have you done Installation?

chenzhu005774 commented 2 months ago

Have you done Installation?

Yes, installation is complete。I started it in the modelscope container, And I start 'python -m bunny.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share 'and' python -m bunny.serve.controller --host 0.0.0.0 --port 10000 'are started normally. My cuda environment is also normal.

Thanks!!

chenzhu005774 commented 2 months ago

Have you done Installation?

Yes, installation is complete。I started it in the modelscope container, And I start 'python -m bunny.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share 'and' python -m bunny.serve.controller --host 0.0.0.0 --port 10000 'are started normally. My cuda environment is also normal.

Thanks!!

and this is my docker images info:docker run --name bunny --gpus all -it -p 10000:10000 7860:7860 --network host registry.cn-beijing.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.12.0 bash

Isaachhh commented 2 months ago

may be the timm version? We use 0.9.16.

chenzhu005774 commented 2 months ago

may be the timm version? We use 0.9.16.

It didn't work, I use the registry.cn-beijing.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.12.0 'The mirror image should be fine

chenzhu005774 commented 2 months ago

may be the timm version? We use 0.9.16.

Or can I ship a docker image of a full environment

Isaachhh commented 2 months ago

Well, I notice that there are some tensorflow errors, which is strange:

2024-04-26 11:49:04.469047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-04-26 11:49:04.469064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-04-26 11:49:04.469078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

BTW, it shoule be python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000/ --port 40000 --worker http://localhost:40000/ --model-path /mnt/workspace/.cache/modelscope/BAAI/Bunny-v1___0-3B --model-type phi-2

chenzhu005774 commented 2 months ago

Well, I notice that there are some tensorflow errors, which is strange:

2024-04-26 11:49:04.469047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-04-26 11:49:04.469064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-04-26 11:49:04.469078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

BTW, it shoule be python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000/ --port 40000 --worker http://localhost:40000/ --model-path /mnt/workspace/.cache/modelscope/BAAI/Bunny-v1___0-3B --model-type phi-2

So can you release a docker image of a complete environment? Thank you

RussRobin commented 2 months ago

Hi @chenzhu005774 , thank you for your interest in our work.

Please try our docker: docker pull russellrobin/bunny:latest. You may also want to refer to Installation and update local codes in our docker.

Regards Russell BAAI

chenzhu005774 commented 2 months ago

docker pull russellrobin/bunny:latest

Thank you. I'll try.

chenzhu005774 commented 2 months ago

Hi @chenzhu005774 , thank you for your interest in our work.

Please try our docker: docker pull russellrobin/bunny:latest. You may also want to refer to Installation and update local codes in our docker.

Regards Russell BAAI

Thank you the environment should be usable. But I couldn't download the model online due to network problems. So I put I manually download good model in the ". / root/cache/huggingface/" directory, Then run the python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://:10001 --port 40000 --worker http://localhost:40000 --model-path BAAI/Bunny-v1_0-3B --model-type phi-2 "Let me download it. So I would like to ask how to load the locally downloaded model

chenzhu005774 commented 2 months ago

Hi @chenzhu005774 , thank you for your interest in our work. Please try our docker: docker pull russellrobin/bunny:latest. You may also want to refer to Installation and update local codes in our docker. Regards Russell BAAI

Thank you the environment should be usable. But I couldn't download the model online due to network problems. So I put I manually download good model in the ". / root/cache/huggingface/" directory, Then run the python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://:10001 --port 40000 --worker http://localhost:40000 --model-path BAAI/Bunny-v1_0-3B --model-type phi-2 "Let me download it. So I would like to ask how to load the locally downloaded model