DB-GPT 运行时，可能是驱动冲突/不支持的原因，导致无法使用GPU的检查以及修正方法

BalloonWorkshop commented 1 year ago

Search before asking

[X] I had searched in the issues and found no similar issues.

Operating system information

Windows

Python version information

3.10

DB-GPT version

latest release

Related scenes

[ ] Chat Data
[ ] Chat Excel
[ ] Chat DB
[ ] Chat Knowledge
[ ] Model Management
[ ] Dashboard
[ ] Plugins

Installation Information

Device information

Device ：GPU Nvidia ： 3090

Models information

LLM ：chatGLM2-6B

What happened

安装完成后，回答速度较慢，在logfile中： model_name: chatglm2-6b model_path: d:\db-gpt\models\chatglm2-6b device: cpu model_type: huggingface prompt_template: None max_context_size: 4096 其中看到device：cpu，应该是运行程序时使用了CPU资源，而不是GPU。

检查： python import torch torch.cuda.is_available() 返回False，于是确认是cuda和pytorch的冲突问题。

解决方案： nvidia-smi获得机器支持最高cuda版本信息，nvcc -V获得当前安装cuda版本信息，到pytorch官网看下cuda和pytorch的版本对应。然后把自己机器上已有的cuda、pytorch删除干净，重装即可。

What you expected to happen

使用CPU运行的logfile截取：

=========================== ModelParameters ===========================

model_name: chatglm2-6b model_path: d:\db-gpt\models\chatglm2-6b device: cpu model_type: huggingface prompt_template: None max_context_size: 4096 num_gpus: None max_gpu_memory: None cpu_offloading: False load_8bit: True load_4bit: False quant_type: nf4 use_double_quant: True compute_dtype: None trust_remote_code: True verbose: False

===========================================================

How to reproduce

不是很大的问题，只是提醒大家，如果觉得跑得慢，就需要看看是不是用的GPU。

Additional context

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

fangyinc commented 1 year ago

@BalloonWorkshop Thank you for your patient exploration.

In addition, We can run dbgpt trace chat --hide_conv command to view more information, we will see an output like:

+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
| Config Key (Webserver) | Config Value (Webserver) | Config Key (EmbeddingModel) |             Config Value (EmbeddingModel)             |
+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
|          host          |         0.0.0.0          |          model_name         |                        text2vec                       |
|          port          |           5000           |          model_path         | /root/autodl-tmp/DB-GPT/models/text2vec-large-chinese |
|         daemon         |          False           |            device           |                          cuda                         |
|         share          |          False           |     normalize_embeddings    |                          None                         |
|    remote_embedding    |          False           |                             |                                                       |
|       log_level        |           None           |                             |                                                       |
|         light          |          False           |                             |                                                       |
+------------------------+--------------------------+-----------------------------+-------------------------------------------------------+
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
| Config Key (ModelWorker) |           Config Value (ModelWorker)          | Config Key (WorkerManager) |          Config Value (WorkerManager)         |
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
|        model_name        |                 vicuna-7b-v1.5                |         model_name         |                 vicuna-7b-v1.5                |
|        model_path        | /root/autodl-tmp/DB-GPT/models/vicuna-7b-v1.5 |         model_path         | /root/autodl-tmp/DB-GPT/models/vicuna-7b-v1.5 |
|          device          |                      cuda                     |        worker_type         |                      None                     |
|        model_type        |                  huggingface                  |        worker_class        |                      None                     |
|     prompt_template      |                      None                     |         model_type         |                  huggingface                  |
|     max_context_size     |                      4096                     |            host            |                    0.0.0.0                    |
|         num_gpus         |                      None                     |            port            |                      5000                     |
|      max_gpu_memory      |                      None                     |           daemon           |                     False                     |
|      cpu_offloading      |                     False                     |  limit_model_concurrency   |                       5                       |
|        load_8bit         |                      True                     |         standalone         |                      True                     |
|        load_4bit         |                     False                     |          register          |                      True                     |
|        quant_type        |                      nf4                      |    worker_register_host    |                      None                     |
|     use_double_quant     |                      True                     |      controller_addr       |             http://127.0.0.1:5000             |
|      compute_dtype       |                      None                     |       send_heartbeat       |                      True                     |
|    trust_remote_code     |                      True                     |     heartbeat_interval     |                       20                      |
|         verbose          |                     False                     |         log_level          |                      None                     |
+--------------------------+-----------------------------------------------+----------------------------+-----------------------------------------------+
+----------------------------------------------------------------------------------------------------+
|                                   ModelWorker System information                                   |
+-------------------+--------------------------------------------------------------------------------+
| System Config Key |                              System Config Value                               |
+-------------------+--------------------------------------------------------------------------------+
|      platform     |                                     linux                                      |
|    distribution   |                                  Ubuntu 22.04                                  |
|   python_version  |                                     3.10.8                                     |
|        cpu        |                 Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz                  |
|      cpu_avx      |                                     AVX512                                     |
|       memory      |                                 1056451812 kB                                  |
|   torch_version   |                                  2.0.1+cu117                                   |
|       device      |                                      cuda                                      |
|   device_version  |                                      11.7                                      |
|    device_count   |                                       1                                        |
|    device_other   | name, driver_version, memory.total [MiB], memory.free [MiB], memory.used [MiB] |
|                   |        NVIDIA GeForce RTX 4090, 535.104.05, 24564 MiB, 24211 MiB, 5 MiB        |
|                   |                                                                                |
+-------------------+--------------------------------------------------------------------------------+

peter-wangxu commented 1 year ago

可以参考：https://pytorch.org/get-started/locally/ 安装正确版本的pytorch

github-actions[bot] commented 8 months ago

This issue has been marked as stale, because it has been over 30 days without any activity.

github-actions[bot] commented 8 months ago

This issue bas been closed, because it has been marked as stale and there has been no activity for over 7 days.

eosphoros-ai / DB-GPT