Closed duhaly closed 11 months ago
你好,根据你的配置和日志信息来看,可能是显存不够了。
根据我们之前的一些简单测试,vicuna-7b-v1.5
开启 8bit 量化大概需要 12G 显存,除了 LLM 之外,embedding 模型也会使用 GPU,也需要占用一定的显存。
推荐以下两种解决方法,你可以使用尝试一下:
开启 4bit 量化,修改 .env
文件:
QUANTIZE_8bit=False
QUANTIZE_4bit=True
使用 cpu 来跑 embedding 模型,修改 .env
文件:
text2vec_device=cpu
按照您说的方法,问题依旧,还有什么设置,我可以继续尝试
(env) kas@DESKTOP-M7N9V23:~/app/DB-GPT$ python pilot/server/dbgpt_server.py
=========================== WebWerverParameters ===========================
host: 0.0.0.0 port: 5000 daemon: False share: False remote_embedding: False log_level: INFO light: False
======================================================================
/home/kas/app/DB-GPT/pilot 2023-09-20 18:38:38 | INFO | pilot.component | Register component with name dbgpt_model_controller and instance: <pilot.model.cluster.controller.controller.ModelControllerAdapter object at 0x7f15f7052d40> 2023-09-20 18:38:38 | INFO | pilot.server.component_configs | Register local LocalEmbeddingFactory 2023-09-20 18:38:38 | INFO | pilot.server.component_configs |
=========================== EmbeddingModelParameters ===========================
model_name: text2vec model_path: /home/kas/app/DB-GPT/models/text2vec-large-chinese device: cpu normalize_embeddings: None
======================================================================
2023-09-20 18:38:39 | INFO | sentence_transformers.SentenceTransformer | Load pretrained SentenceTransformer: /home/kas/app/DB-GPT/models/text2vec-large-chinese 2023-09-20 18:38:39 | WARNING | sentence_transformers.SentenceTransformer | No sentence-transformers model found with name /home/kas/app/DB-GPT/models/text2vec-large-chinese. Creating a new one with MEAN pooling. 2023-09-20 18:38:39 | INFO | torch.distributed.nn.jit.instantiator | Created a temporary directory at /tmp/tmpwd6uq9ad 2023-09-20 18:38:39 | INFO | torch.distributed.nn.jit.instantiator | Writing /tmp/tmpwd6uq9ad/_remote_module_non_scriptable.py 2023-09-20 18:38:43 | INFO | pilot.component | Register component with name embedding_factory and instance: <pilot.server.component_configs.LocalEmbeddingFactory object at 0x7f15ca438a00> Model Unified Deployment Mode! 2023-09-20 18:38:43 | INFO | model_worker | Worker params:
=========================== ModelWorkerParameters ===========================
model_name: vicuna-7b-v1.5 model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5 worker_type: None worker_class: None host: 0.0.0.0 port: 5000 daemon: False limit_model_concurrency: 5 standalone: True register: True worker_register_host: None controller_addr: None send_heartbeat: True heartbeat_interval: 20
======================================================================
2023-09-20 18:38:43 | INFO | model_worker | Run WorkerManager with standalone mode, controller_addr: http://127.0.0.1:5000
Found llm model adapter with model name: vicuna-7b-v1.5, <pilot.model.adapter.VicunaLLMAdapater object at 0x7f1690b7a0b0>
2023-09-20 18:38:43 | INFO | LOGGER | Found llm model adapter with model name: vicuna-7b-v1.5, <pilot.model.adapter.VicunaLLMAdapater object at 0x7f1690b7a0b0>
2023-09-20 18:38:43 | INFO | model_worker | model_name: vicuna-7b-v1.5, model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5, model_param_class: <class 'pilot.model.parameter.ModelParameters'>
Get model chat adapter with model name vicuna-7b-v1.5, <pilot.server.chat_adapter.VicunaChatAdapter object at 0x7f1690b00bb0>
2023-09-20 18:38:43 | INFO | model_worker | [DefaultModelWorker] Parameters of device is None, use cuda
2023-09-20 18:38:43 | INFO | model_worker | Init empty instances list for vicuna-7b-v1.5@llm
2023-09-20 18:38:43 | INFO | pilot.component | Register component with name dbgpt_worker_manager_factory and instance: <pilot.model.cluster.worker.manager._DefaultWorkerManagerFactory object at 0x7f15c6aae8c0>
INFO: Started server process [586]
INFO: Waiting for application startup.
2023-09-20 18:38:43 | INFO | model_worker | Begin start all worker, apply_req: None
2023-09-20 18:38:43 | INFO | model_worker | Apply req: None, apply_func: <function LocalWorkerManager._start_all_worker.
=========================== ModelParameters ===========================
model_name: vicuna-7b-v1.5 model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5 device: cuda model_type: huggingface prompt_template: None max_context_size: 4096 num_gpus: None max_gpu_memory: None cpu_offloading: False load_8bit: False load_4bit: True quant_type: nf4 use_double_quant: True compute_dtype: None trust_remote_code: True verbose: False
======================================================================
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit) max_memory: {0: '10GiB'} 2023-09-20 18:38:43 | DEBUG | LOGGER | max_memory: {0: '10GiB'} Using the following 4-bit params: {'load_in_4bit': True, 'bnb_4bit_compute_dtype': None, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True} 2023-09-20 18:38:43 | WARNING | LOGGER | Using the following 4-bit params: {'load_in_4bit': True, 'bnb_4bit_compute_dtype': None, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True} params: {'low_cpu_mem_usage': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "load_in_4bit": true } , 'torch_dtype': torch.float16, 'max_memory': {0: '10GiB'}, 'trust_remote_code': True} 2023-09-20 18:38:43 | INFO | LOGGER | params: {'low_cpu_mem_usage': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "load_in_4bit": true } , 'torch_dtype': torch.float16, 'max_memory': {0: '10GiB'}, 'trust_remote_code': True} Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Killed (env) kas@DESKTOP-M7N9V23:~/app/DB-GPT$
我看你的环境写的是 Linux,我看截图是 Windows 任务管理器,你是在虚拟机跑的么还是其它虚拟环境,你可以看看虚拟环境是否已经分配和足够的内存和显存。
我用的是 wsl2
我用的是 wsl2
wsl2 环境我目前没有验证过,推荐你在windows powershell 中拉最新 main 分支代码执行命令pip install -e ".[default]"
直接在 windows 中部署试试看。
我用的是 wsl2
wsl2 环境我目前没有验证过,推荐你在windows powershell 中拉最新 main 分支代码执行命令
pip install -e ".[default]"
直接在 windows 中部署试试看。
好的,明天我就按照您说的测试一下,非常感谢,结果明天回复您
你好,根据你的配置和日志信息来看,可能是显存不够了。
根据我们之前的一些简单测试,
vicuna-7b-v1.5
开启 8bit 量化大概需要 12G 显存,除了 LLM 之外,embedding 模型也会使用 GPU,也需要占用一定的显存。推荐以下两种解决方法,你可以使用尝试一下:
- 开启 4bit 量化,修改
.env
文件:QUANTIZE_8bit=False QUANTIZE_4bit=True
- 使用 cpu 来跑 embedding 模型,修改
.env
文件:text2vec_device=cpu
你好,根据你的配置和日志信息来看,可能是显存不够了。
根据我们之前的一些简单测试,
vicuna-7b-v1.5
开启 8bit 量化大概需要 12G 显存,除了 LLM 之外,embedding 模型也会使用 GPU,也需要占用一定的显存。推荐以下两种解决方法,你可以使用尝试一下:
- 开启 4bit 量化,修改
.env
文件:QUANTIZE_8bit=False QUANTIZE_4bit=True
- 使用 cpu 来跑 embedding 模型,修改
.env
文件:text2vec_device=cpu
想问下运行 dbgpt/app/dbgpt_server.py,不知道下面打印的输出从哪来的, .env的设置,改下面两个环境变量也无效。 LLM_MODEL: tongyi_proxyllm MODEL_SERVER: http://127.0.0.1:8000
=========================== EmbeddingModelParameters ===========================
model_name: bge-large-zh model_path: /media/data/cwb/DB-GPT/models/bge-large-zh device: cuda normalize_embeddings: None rerank: False max_length: None
======================================================================
2024-06-07 15:36:24 gptai sentence_transformers.SentenceTransformer[3303850] INFO Load pretrained SentenceTransformer: /media/data/cwb/DB-GPT/models/bge-large-zh
Traceback (most recent call last):
File "/media/data/xgp/repo/DB-GPT/dbgpt/app/dbgpt_server.py", line 281, in
Search before asking
Operating system information
Linux
Python version information
3.10
DB-GPT version
main
Related scenes
Installation Information
[ ] Installation From Source
[ ] Docker Installation
[ ] Docker Compose Installation
[ ] Cluster Installation
[ ] AutoDL Image
[ ] Other
Device information
Device : CPU and GPU GPU : 1 GPU Memory: 12G GPU type: RTX 3060
Models information
LLM: vicuna-7b-v1.5 EMBinding: text2vec-large-chinese
What happened
kas@DESKTOP-M7N9V23:~/app/DB-GPT$ python pilot/server/dbgpt_server.py
=========================== WebWerverParameters ===========================
host: 0.0.0.0 port: 5000 daemon: False share: False remote_embedding: False log_level: INFO light: False
======================================================================
/home/kas/app/DB-GPT/pilot 2023-09-19 17:07:47 | INFO | pilot.component | Register component with name dbgpt_model_controller and instance: <pilot.model.cluster.controller.controller.ModelControllerAdapter object at 0x7f208bf6ea70> 2023-09-19 17:07:47 | INFO | pilot.server.component_configs | Register local LocalEmbeddingFactory 2023-09-19 17:07:47 | INFO | pilot.model.cluster.worker.embedding_worker | [EmbeddingsModelWorker] Parameters of device is None, use cuda 2023-09-19 17:07:47 | INFO | pilot.server.component_configs |
=========================== EmbeddingModelParameters ===========================
model_name: text2vec model_path: /home/kas/app/DB-GPT/models/text2vec-large-chinese device: cuda normalize_embeddings: None
======================================================================
2023-09-19 17:07:48 | INFO | sentence_transformers.SentenceTransformer | Load pretrained SentenceTransformer: /home/kas/app/DB-GPT/models/text2vec-large-chinese 2023-09-19 17:07:48 | WARNING | sentence_transformers.SentenceTransformer | No sentence-transformers model found with name /home/kas/app/DB-GPT/models/text2vec-large-chinese. Creating a new one with MEAN pooling. 2023-09-19 17:07:48 | INFO | torch.distributed.nn.jit.instantiator | Created a temporary directory at /tmp/tmpq32b_nlf 2023-09-19 17:07:48 | INFO | torch.distributed.nn.jit.instantiator | Writing /tmp/tmpq32b_nlf/_remote_module_non_scriptable.py 2023-09-19 17:07:49 | INFO | pilot.component | Register component with name embedding_factory and instance: <pilot.server.component_configs.LocalEmbeddingFactory object at 0x7f205f324730> Model Unified Deployment Mode! 2023-09-19 17:07:49 | INFO | model_worker | Worker params:
=========================== ModelWorkerParameters ===========================
model_name: vicuna-7b-v1.5 model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5 worker_type: None worker_class: None host: 0.0.0.0 port: 5000 daemon: False limit_model_concurrency: 5 standalone: True register: True worker_register_host: None controller_addr: None send_heartbeat: True heartbeat_interval: 20
======================================================================
2023-09-19 17:07:49 | INFO | model_worker | Run WorkerManager with standalone mode, controller_addr: http://127.0.0.1:5000 Found llm model adapter with model name: vicuna-7b-v1.5, <pilot.model.adapter.VicunaLLMAdapater object at 0x7f2125995db0> 2023-09-19 17:07:49 | INFO | LOGGER | Found llm model adapter with model name: vicuna-7b-v1.5, <pilot.model.adapter.VicunaLLMAdapater object at 0x7f2125995db0> 2023-09-19 17:07:49 | INFO | model_worker | model_name: vicuna-7b-v1.5, model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5, model_param_class: <class 'pilot.model.parameter.ModelParameters'> Get model chat adapter with model name vicuna-7b-v1.5, <pilot.server.chat_adapter.VicunaChatAdapter object at 0x7f21259188b0> 2023-09-19 17:07:49 | INFO | model_worker | [DefaultModelWorker] Parameters of device is None, use cuda 2023-09-19 17:07:49 | INFO | model_worker | Init empty instances list for vicuna-7b-v1.5@llm 2023-09-19 17:07:49 | INFO | pilot.component | Register component with name dbgpt_worker_manager_factory and instance: <pilot.model.cluster.worker.manager._DefaultWorkerManagerFactory object at 0x7f2053a9e5f0> INFO: Started server process [2463] INFO: Waiting for application startup. 2023-09-19 17:07:49 | INFO | model_worker | Begin start all worker, apply_req: None 2023-09-19 17:07:49 | INFO | model_worker | Apply req: None, apply_func: <function LocalWorkerManager._start_all_worker.._start_worker at 0x7f205389b520>
2023-09-19 17:07:49 | INFO | model_worker | Apply to all workers
INFO: Application startup complete.
2023-09-19 17:07:49 | INFO | model_worker | Begin load model, model params:
=========================== ModelParameters ===========================
model_name: vicuna-7b-v1.5 model_path: /home/kas/app/DB-GPT/models/vicuna-7b-v1.5 device: cuda model_type: huggingface prompt_template: None max_context_size: 4096 num_gpus: None max_gpu_memory: None cpu_offloading: False load_8bit: True load_4bit: False quant_type: nf4 use_double_quant: True compute_dtype: None trust_remote_code: True verbose: False
======================================================================
max_memory: {0: '10GiB'} INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit) 2023-09-19 17:07:49 | DEBUG | LOGGER | max_memory: {0: '10GiB'} Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]INFO: 127.0.0.1:54396 - "GET / HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /css/319e16dd59ffd1d7.css HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/webpack-e39fb0ddb24a46cf.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/main-106e14a4d176f289.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/framework-0274f228b2a17278.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/pages/_app-edeed3caf45d578a.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/913-b5bc9815149e2ad5.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/66-791bb03098dc9265.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/707-109d4fec9e26030d.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /_BY-cQzLf2lL8o4uTsVNy/_buildManifest.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/pages/index-d5aba6bbbc1d8aaa.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /_BY-cQzLf2lL8o4uTsVNy/_ssgManifest.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /LOGO_1.png HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /LOGO.png HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /api/v1/chat/dialogue/list HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "POST /api/v1/chat/dialogue/scenes HTTP/1.1" 200 OK 2023-09-19 17:07:55 | INFO | pilot.openapi.api_v1.api_v1 | /controller/model/types 2023-09-19 17:07:55 | INFO | root | Get all instances with None, healthy_only: True defaultdict(<class 'list'>, {}) INFO: 127.0.0.1:54408 - "GET /api/v1/model/types HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/566-31b5bf29f3e84615.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/902-c56acea399c45e57.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/625-63aa85328eed0b3e.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/455-5c8f2c8bda9b4b83.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/46-2a716444a56f6f08.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/847-4335b5938375e331.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/pages/database-ddf0a72485646c52.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/29107295-90b90cb30c825230.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/939-126a01b0d827f3b4.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/556-26ffce13383f774a.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/589-8dfb35868cafc00b.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54408 - "GET /chunks/241-4117dd68a591b7fa.js HTTP/1.1" 200 OK INFO: 127.0.0.1:54396 - "GET /chunks/pages/datastores-4fb48131988df037.js HTTP/1.1" 200 OK Killed (env) kas@DESKTOP-M7N9V23:~/app/DB-GPT$
What you expected to happen
How to reproduce
python3 pilot/server/dbgpt_server.py
Additional context
No response
Are you willing to submit PR?