PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.04k stars 220 forks source link

训练时报错AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #144

Open Qinger27 opened 7 months ago

Qinger27 commented 7 months ago

下面是报错信息,可以帮我看看吗?

ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/dockerdata/graceqwang/videollava/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/dockerdata/graceqwang/my_code/Video-LLaVA/videollava/train/train_mem.py", line 12, in train() File "/dockerdata/graceqwang/my_code/Video-LLaVA/videollava/train/train.py", line 1074, in train trainer.train() File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/transformers/trainer.py", line 1656, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1198, in prepare result = self._prepare_deepspeed(*args) File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/accelerate/accelerator.py", line 1531, in _prepare_deepspeed optimizer = DeepSpeedCPUAdam(optimizer.param_groups, **defaults) File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init self.ds_opt_adam = CPUAdamBuilder().load() File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load return self.jit_load(verbose) File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load op_module = load(name=self.name, File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'cpu_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f0e48eba4d0> Traceback (most recent call last): File "/dockerdata/graceqwang/videollava/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

dwsmart32 commented 7 months ago

I have the same issue. Have you solved it?

Wuyingwen commented 5 months ago

maybe you need to update your gcc version, this could solve this problem.

cs19469 commented 4 months ago

请问你解决了吗?

Wuyingwen commented 4 months ago

解决了,通过把gcc版本提高

xin-li-67 commented 3 months ago

Hi,

I solved this problem by re-specifying the torch versions in ~/.bashrc using export LIBRARY_PATH=${you cuda lib64 path}:$LIBRARY_PATH instead of $LD_LIBRARY_PATH. A bit wield.