Closed BalloonWorkshop closed 8 months ago
@BalloonWorkshop Thank you for your patient exploration.
In addition, We can run dbgpt trace chat --hide_conv
command to view more information, we will see an output like:
+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
| Config Key (Webserver) | Config Value (Webserver) | Config Key (EmbeddingModel) | Config Value (EmbeddingModel) |
+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
| host | 0.0.0.0 | model_name | text2vec |
| port | 5000 | model_path | /root/autodl-tmp/DB-GPT/models/text2vec-large-chinese |
| daemon | False | device | cuda |
| share | False | normalize_embeddings | None |
| remote_embedding | False | | |
| log_level | None | | |
| light | False | | |
+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
| Config Key (ModelWorker) | Config Value (ModelWorker) | Config Key (WorkerManager) | Config Value (WorkerManager) |
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
| model_name | vicuna-7b-v1.5 | model_name | vicuna-7b-v1.5 |
| model_path | /root/autodl-tmp/DB-GPT/models/vicuna-7b-v1.5 | model_path | /root/autodl-tmp/DB-GPT/models/vicuna-7b-v1.5 |
| device | cuda | worker_type | None |
| model_type | huggingface | worker_class | None |
| prompt_template | None | model_type | huggingface |
| max_context_size | 4096 | host | 0.0.0.0 |
| num_gpus | None | port | 5000 |
| max_gpu_memory | None | daemon | False |
| cpu_offloading | False | limit_model_concurrency | 5 |
| load_8bit | True | standalone | True |
| load_4bit | False | register | True |
| quant_type | nf4 | worker_register_host | None |
| use_double_quant | True | controller_addr | http://127.0.0.1:5000 |
| compute_dtype | None | send_heartbeat | True |
| trust_remote_code | True | heartbeat_interval | 20 |
| verbose | False | log_level | None |
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
+----------------------------------------------------------------------------------------------------+
| ModelWorker System information |
+-------------------+--------------------------------------------------------------------------------+
| System Config Key | System Config Value |
+-------------------+--------------------------------------------------------------------------------+
| platform | linux |
| distribution | Ubuntu 22.04 |
| python_version | 3.10.8 |
| cpu | Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz |
| cpu_avx | AVX512 |
| memory | 1056451812 kB |
| torch_version | 2.0.1+cu117 |
| device | cuda |
| device_version | 11.7 |
| device_count | 1 |
| device_other | name, driver_version, memory.total [MiB], memory.free [MiB], memory.used [MiB] |
| | NVIDIA GeForce RTX 4090, 535.104.05, 24564 MiB, 24211 MiB, 5 MiB |
| | |
+-------------------+--------------------------------------------------------------------------------+
可以参考:https://pytorch.org/get-started/locally/ 安装正确版本的pytorch
This issue has been marked as stale
, because it has been over 30 days without any activity.
This issue bas been closed, because it has been marked as stale
and there has been no activity for over 7 days.
Search before asking
Operating system information
Windows
Python version information
3.10
DB-GPT version
latest release
Related scenes
Installation Information
[X] Installation From Source
[ ] Docker Installation
[ ] Docker Compose Installation
[ ] Cluster Installation
[ ] AutoDL Image
[ ] Other
Device information
Device :GPU Nvidia : 3090
Models information
LLM :chatGLM2-6B
What happened
安装完成后,回答速度较慢,在logfile中: model_name: chatglm2-6b model_path: d:\db-gpt\models\chatglm2-6b device: cpu model_type: huggingface prompt_template: None max_context_size: 4096 其中看到device:cpu,应该是运行程序时使用了CPU资源,而不是GPU。
检查: python import torch torch.cuda.is_available() 返回False,于是确认是cuda和pytorch的冲突问题。
解决方案: nvidia-smi获得机器支持最高cuda版本信息,nvcc -V获得当前安装cuda版本信息,到pytorch官网看下cuda和pytorch的版本对应。然后把自己机器上已有的cuda、pytorch删除干净,重装即可。
What you expected to happen
使用CPU运行的logfile截取:
=========================== ModelParameters ===========================
model_name: chatglm2-6b model_path: d:\db-gpt\models\chatglm2-6b device: cpu model_type: huggingface prompt_template: None max_context_size: 4096 num_gpus: None max_gpu_memory: None cpu_offloading: False load_8bit: True load_4bit: False quant_type: nf4 use_double_quant: True compute_dtype: None trust_remote_code: True verbose: False
===========================================================
How to reproduce
不是很大的问题,只是提醒大家,如果觉得跑得慢,就需要看看是不是用的GPU。
Additional context
No response
Are you willing to submit PR?