+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... On | 00000000:00:07.0 Off | Off |
| N/A 32C P0 33W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Linux autodl-container-a3d5118ffa-751dc0f2 5.4.0-99-generic #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Error info:
global rank 0 is loading checkpoint /sharefs/cogview-new/cogvideo-stage1/27000/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "cogvideo_pipeline.py", line 793, in
main(args)
File "cogvideo_pipeline.py", line 426, in main
model_stage1, args = InferenceModel_Sequential.from_pretrained(args, 'cogvideo-stage1')
File "/root/miniconda3/lib/python3.8/site-packages/SwissArmyTransformer/model/base_model.py", line 155, in from_pretrained
load_checkpoint(model, args, load_path=model_path)
File "/root/miniconda3/lib/python3.8/site-packages/SwissArmyTransformer/training/model_io.py", line 162, in load_checkpoint
sd = torch.load(checkpoint_name, map_location='cpu')
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 777, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 282, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
root@autodl-container-a3d5118ffa-751dc0f2:~/autodl-tmp/CogVideo-main#
Device info: GPU Type: A100, 40G memory Python 3.8.10 (default, Jun 4 2021, 15:09:15)
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... On | 00000000:00:07.0 Off | Off | | N/A 32C P0 33W / 250W | 0MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Linux autodl-container-a3d5118ffa-751dc0f2 5.4.0-99-generic #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Error info: global rank 0 is loading checkpoint /sharefs/cogview-new/cogvideo-stage1/27000/mp_rank_00_model_states.pt Traceback (most recent call last): File "cogvideo_pipeline.py", line 793, in
main(args)
File "cogvideo_pipeline.py", line 426, in main
model_stage1, args = InferenceModel_Sequential.from_pretrained(args, 'cogvideo-stage1')
File "/root/miniconda3/lib/python3.8/site-packages/SwissArmyTransformer/model/base_model.py", line 155, in from_pretrained
load_checkpoint(model, args, load_path=model_path)
File "/root/miniconda3/lib/python3.8/site-packages/SwissArmyTransformer/training/model_io.py", line 162, in load_checkpoint
sd = torch.load(checkpoint_name, map_location='cpu')
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 777, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 282, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
root@autodl-container-a3d5118ffa-751dc0f2:~/autodl-tmp/CogVideo-main#