Cannot load safetensors OSError: No such device (os error 19)

complexfilter commented 1 day ago

System Info / 系統信息

H100, CUDA 12.4

Information / 问题信息

[rank0]:   File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4226, in from_pretrained
[rank0]:     ) = cls._load_pretrained_model(
[rank0]:   File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4707, in _load_pretrained_model
[rank0]:     state_dict = load_state_dict(
[rank0]:   File "/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 556, in load_state_dict
[rank0]:     with safe_open(checkpoint_file, framework="pt") as f:
[rank0]: OSError: No such device (os error 19)

[X] The official example scripts / 官方的示例脚本

mkdir CogVideoX-5b-sat
cd CogVideoX-5b-sat
wget https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
mv 'index.html?dl=1' vae.zip
unzip vae.zip
wget https://cloud.tsinghua.edu.cn/f/556a3e1329e74f1bac45/?dl=1
mv 'index.html?dl=1' transformer.zip
unzip transformer.zip
git clone https://huggingface.co/THUDM/CogVideoX-5b.git
mkdir t5-v1_1-xxl
mv CogVideoX-5b/text_encoder/* CogVideoX-5b/tokenizer/* t5-v1_1-xxl
wget https://modelscope.cn/models/ZhipuAI/CogVideoX-2b/resolve/master/text_encoder/model-00001-of-00002.safetensors
wget https://modelscope.cn/models/ZhipuAI/CogVideoX-2b/resolve/master/text_encoder/model-00002-of-00002.safetensors
cd sat
bash inference.sh

Reproduction / 复现过程

reproduce step: exactly using SAT inference tutorial: https://zhipu-ai.feishu.cn/wiki/Bpc3wLhPRieJ53kdGiocDzAPnRf

mkdir CogVideoX-5b-sat
cd CogVideoX-5b-sat
wget https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
mv 'index.html?dl=1' vae.zip
unzip vae.zip
wget https://cloud.tsinghua.edu.cn/f/556a3e1329e74f1bac45/?dl=1
mv 'index.html?dl=1' transformer.zip
unzip transformer.zip
git clone https://huggingface.co/THUDM/CogVideoX-5b.git
mkdir t5-v1_1-xxl
mv CogVideoX-5b/text_encoder/* CogVideoX-5b/tokenizer/* t5-v1_1-xxl
wget https://modelscope.cn/models/ZhipuAI/CogVideoX-2b/resolve/master/text_encoder/model-00001-of-00002.safetensors
wget https://modelscope.cn/models/ZhipuAI/CogVideoX-2b/resolve/master/text_encoder/model-00002-of-00002.safetensors
cd sat
bash inference.sh

Expected behavior / 期待表现

I expect the loading of t5-v1_1-xxl smoothly.

zRzRzRzRzRzRzR commented 22 hours ago

"Can you send a list of the files in the T5 folder after you have extracted it?"

complexfilter commented 21 hours ago

"Can you send a list of the files in the T5 folder after you have extracted it?"

total 9093057
-rw-r--r-- 1 colligo users       2593 Oct 31 07:19 added_tokens.json
-rw-r--r-- 1 colligo users        809 Oct 31 07:19 config.json
-rw-r--r-- 1 colligo users 4994582224 Oct 31 07:23 model-00001-of-00002.safetensors
-rw-r--r-- 1 colligo users 4530066360 Oct 31 07:22 model-00002-of-00002.safetensors
-rw-r--r-- 1 colligo users      19885 Oct 31 07:19 model.safetensors.index.json
-rw-r--r-- 1 colligo users       2543 Oct 31 07:19 special_tokens_map.json
-rw-r--r-- 1 colligo users     791656 Oct 31 07:19 spiece.model
-rw-r--r-- 1 colligo users      20617 Oct 31 07:19 tokenizer_config.json

zRzRzRzRzRzRzR commented 1 hour ago

├── added_tokens.json
├── config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── special_tokens_map.json
├── spiece.model
└── tokenizer_config.json

Indeed, it seems that no necessary files are missing. Can you verify using transformers==4.45

THUDM / CogVideo

Cannot load safetensors OSError: No such device (os error 19) #455

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现