THUDM / CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model
https://codegeex.cn
Apache License 2.0
7.62k stars 532 forks source link

12G显存执行失败 直接python ./demo/run_demo.py 什么都没有改 #20

Open ayun110 opened 1 year ago

ayun110 commented 1 year ago

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:13<00:00, 1.98s/it] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ X:\post\InvokeAI-installer-v2.3.5.post2\pythonProject\code1\demo\run_demo.py:11 in │ │ │ │ 8 from transformers import AutoTokenizer, AutoModel │ │ 9 │ │ 10 tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) │ │ ❱ 11 model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).to('cuda │ │ 12 model = model.eval() │ │ 13 │ │ 14 examples = [] │ │ │ │ C:\Python311\Lib\site-packages\transformers\modeling_utils.py:1902 in to │ │ │ │ 1899 │ │ │ │ " model has already been set to the correct devices and casted to the co │ │ 1900 │ │ │ ) │ │ 1901 │ │ else: │ │ ❱ 1902 │ │ │ return super().to(*args, *kwargs) │ │ 1903 │ │ │ 1904 │ def half(self, args): │ │ 1905 │ │ # Checks if the model has been loaded in 8-bit │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1152 in to │ │ │ │ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1151 │ │ │ │ ❱ 1152 │ │ return self._apply(convert) │ │ 1153 │ │ │ 1154 │ def register_full_backward_pre_hook( │ │ 1155 │ │ self, │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │ │ │ │ 799 │ def _apply(self, fn, recurse=True): │ │ 800 │ │ if recurse: │ │ 801 │ │ │ for module in self.children(): │ │ ❱ 802 │ │ │ │ module._apply(fn) │ │ 803 │ │ │ │ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │ │ │ │ 799 │ def _apply(self, fn, recurse=True): │ │ 800 │ │ if recurse: │ │ 801 │ │ │ for module in self.children(): │ │ ❱ 802 │ │ │ │ module._apply(fn) │ │ 803 │ │ │ │ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:825 in _apply │ │ │ │ 822 │ │ │ # track autograd history of param_applied, so we have to use │ │ 823 │ │ │ # with torch.no_grad(): │ │ 824 │ │ │ with torch.no_grad(): │ │ ❱ 825 │ │ │ │ param_applied = fn(param) │ │ 826 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ │ 827 │ │ │ if should_use_set_data: │ │ 828 │ │ │ │ param.data = param_applied │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1150 in convert │ │ │ │ 1147 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │ │ 1148 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │ │ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ ❱ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1151 │ │ │ │ 1152 │ │ return self._apply(convert) │ │ 1153 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 11.16 GiB is allocated by PyTorch, and 1.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ayun110 commented 1 year ago

管理器显示0.8/12.0 GB 空闲有11.2g

wikeeyang commented 1 year ago

我6G显存,采用量化INT4加载正常的codegeex2-6b模型没问题,你试试,指令如下: model = AutoModel.from_pretrained(model_path, trust_remote_code=True).quantize(4).cuda() 你也可以试试: model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()

wikeeyang commented 1 year ago

补充一下:我是Windows环境,Windows11 x64,Python 3.11,Torch 2.0 cuda 118。我平时不同的基础Python环境、模型和应用是分开路径存放的。上面的model_path = "D:\AITest\Model\codegeex2-6b"。

Stanislas0 commented 1 year ago

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:13<00:00, 1.98s/it] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ X:\post\InvokeAI-installer-v2.3.5.post2\pythonProject\code1\demo\run_demo.py:11 in │ │ │ │ 8 from transformers import AutoTokenizer, AutoModel │ │ 9 │ │ 10 tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) │ │ ❱ 11 model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).to('cuda │ │ 12 model = model.eval() │ │ 13 │ │ 14 examples = [] │ │ │ │ C:\Python311\Lib\site-packages\transformers\modeling_utils.py:1902 in to │ │ │ │ 1899 │ │ │ │ " model has already been set to the correct devices and casted to the co │ │ 1900 │ │ │ ) │ │ 1901 │ │ else: │ │ ❱ 1902 │ │ │ return super().to(*args, *kwargs) │ │ 1903 │ │ │ 1904 │ def half(self, args): │ │ 1905 │ │ # Checks if the model has been loaded in 8-bit │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1152 in to │ │ │ │ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1151 │ │ │ │ ❱ 1152 │ │ return self._apply(convert) │ │ 1153 │ │ │ 1154 │ def register_full_backward_pre_hook( │ │ 1155 │ │ self, │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │ │ │ │ 799 │ def _apply(self, fn, recurse=True): │ │ 800 │ │ if recurse: │ │ 801 │ │ │ for module in self.children(): │ │ ❱ 802 │ │ │ │ module._apply(fn) │ │ 803 │ │ │ │ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │ │ │ │ 799 │ def _apply(self, fn, recurse=True): │ │ 800 │ │ if recurse: │ │ 801 │ │ │ for module in self.children(): │ │ ❱ 802 │ │ │ │ module._apply(fn) │ │ 803 │ │ │ │ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:825 in _apply │ │ │ │ 822 │ │ │ # track autograd history of param_applied, so we have to use │ │ 823 │ │ │ # with torch.no_grad(): │ │ 824 │ │ │ with torch.no_grad(): │ │ ❱ 825 │ │ │ │ param_applied = fn(param) │ │ 826 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ │ 827 │ │ │ if should_use_set_data: │ │ 828 │ │ │ │ param.data = param_applied │ │ │ │ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1150 in convert │ │ │ │ 1147 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │ │ 1148 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │ │ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ ❱ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1151 │ │ │ │ 1152 │ │ return self._apply(convert) │ │ 1153 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 11.16 GiB is allocated by PyTorch, and 1.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

12G显存的话,全量模型放不下,需要做量化,可以下载这个版本的权重:https://huggingface.co/THUDM/codegeex2-6b-int4

Stanislas0 commented 1 year ago

参考推理教程,里面有讲解几种不同的量化方法。