PKU-YuanGroup / ChatLaw

ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型
https://chatlaw.cloud/
GNU Affero General Public License v3.0
6.94k stars 543 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #64

Open nuaabuaa07 opened 1 year ago

nuaabuaa07 commented 1 year ago

步骤3:合并ChatLaw权重并推理 ,这一步骤在执行时,报错。RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) 。是不支持在多卡的机器上创建推理吗?

nuaabuaa07 commented 1 year ago

难道,推理服务,只能部署在单GPU的机器上?

nuaabuaa07 commented 1 year ago

单卡时报内存不足。 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 22.20 GiB total capacity; 21.53 GiB already allocated; 48.12 MiB free; 21.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

niceyida commented 11 months ago

我也遇到了相同的问题,但是我是单卡机器也是报错 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 请问有解决方案或者排查思路吗?

niceyida commented 11 months ago

我也遇到了相同的问题,但是我是单卡机器也是报错 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 请问有解决方案或者排查思路吗?

找到解决方法了,因为transforms版本过高导致的报错,回退到4.29.0之后,问题解决

lichenyigit commented 11 months ago

我也遇到了相同的问题,但是我是单卡机器也是报错 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 请问有解决方案或者排查思路吗?

找到解决方法了,因为transforms版本过高导致的报错,回退到4.29.0之后,问题解决

没找到这个版本啊,请问你是怎么安装的? ` (base) ➜ pip install transforms==4.29.0 ERROR: Could not find a version that satisfies the requirement transforms==4.29.0 (from versions: 0.1, 0.2.0, 0.2.1) ERROR: No matching distribution found for transforms==4.29.0

`

niceyida commented 11 months ago

我也遇到了相同的问题,但是我是单卡机器也是报错 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 请问有解决方案或者排查思路吗?

找到解决方法了,因为transforms版本过高导致的报错,回退到4.29.0之后,问题解决

没找到这个版本啊,请问你是怎么安装的? ` (base) ➜ pip install transforms==4.29.0 ERROR: Could not find a version that satisfies the requirement transforms==4.29.0 (from versions: 0.1, 0.2.0, 0.2.1) ERROR: No matching distribution found for transforms==4.29.0

` 不好意思,上面单词拼写有误,应该是transformers,请参考https://pypi.org/project/transformers/#history