OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.16k stars 818 forks source link

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #70

Closed alexhmyang closed 1 year ago

alexhmyang commented 1 year ago

RuntimeError: Error building extension 'cpu_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f683231b670> Traceback (most recent call last): File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' [2023-04-03 12:50:15,113] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 21626 [2023-04-03 12:50:15,113] [ERROR] [launch.py:324:sigkill_handler] ['/home/u20/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/u20/LMFlow/data/alpaca/train', '--output_dir', '/home/u20/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

error when run ./scripts/run_finetune.sh i have gpu and cuda installed, why it raises cpu error?

./scripts/run_finetune_with_lora.sh also raise same error

callofdutyops commented 1 year ago

could you please provide more log? I think there should be another error before this.

yana-xuyan commented 1 year ago

Hi I also get the same error. The log is as follows:

(lmflow) xuyan@black-rack-0:~/LLM/LMFlow$ CUDA_VISIBLE_DEVICES=0 ./scripts/run_finetune.sh "--num_gpus=1 --master_port 10001" [2023-04-03 14:59:52,961] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Detected CUDA_VISIBLE_DEVICES=0 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed. [2023-04-03 14:59:55,358] [INFO] [runner.py:550:main] cmd = /home/xuyan/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=10001 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /home/xuyan/LLM/LMFlow/data/alpaca/train --output_dir /home/xuyan/LLM/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1 [2023-04-03 14:59:57,679] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2023-04-03 14:59:57,680] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-04-03 14:59:57,680] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-04-03 14:59:57,680] [INFO] [launch.py:162:main] dist_world_size=1 [2023-04-03 14:59:57,680] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2023-04-03 15:00:05,633] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 04/03/2023 15:00:06 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 04/03/2023 15:00:07 - WARNING - datasets.builder - Found cached dataset json (/home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) [2023-04-03 15:00:14,782] [INFO] [partition_parameters.py:415:exit] finished initializing model with 0.16B parameters /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. warnings.warn( 04/03/2023 15:00:15 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize..tokenize_function at 0x7f217c927f70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 04/03/2023 15:00:15 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow 04/03/2023 15:00:15 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bbe2d282518ba636.arrow Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/xuyan/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/xuyan/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o FAILED: custom_cuda_kernel.cuda.o /usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xuyan/LLM/LMFlow/examples/finetune.py", line 70, in main() File "/home/xuyan/LLM/LMFlow/examples/finetune.py", line 66, in main tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset) File "/home/xuyan/LLM/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train return inner_training_loop( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init deepspeedengine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize engine = DeepSpeedEngine(args=args, File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in init self._configure_optimizer(optimizer, model_parameters) File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters) File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer optimizer = DeepSpeedCPUAdam(model_parameters, File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in init self.ds_opt_adam = CPUAdamBuilder().load() File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load return self.jit_load(verbose) File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load op_module = load( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'cpu_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f2093eb6b80> Traceback (most recent call last): File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' [2023-04-03 15:00:22,718] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 5948 [2023-04-03 15:00:22,719] [ERROR] [launch.py:324:sigkill_handler] ['/home/xuyan/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/xuyan/LLM/LMFlow/data/alpaca/train', '--output_dir', '/home/xuyan/LLM/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

callofdutyops commented 1 year ago

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0
alexhmyang commented 1 year ago

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

(lmflow) u20@u20:~/LMFlow/service$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

cuda 11.6 not work?

callofdutyops commented 1 year ago

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

(lmflow) u20@u20:~/LMFlow/service$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

cuda 11.6 not work?

I found it's always hard to debug CUDA version related issues...

It works fine on my machine using conda to install 11.7 version CUDA.

shizhediao commented 1 year ago

Yes you are right. I also found that it is a CUDA-related issue. It seems that CUDA11.0 is too old to run deepspeed. But cuda 11.6 should be fine I think.

Thank you very much for your help!

research4pan commented 1 year ago
...
nvcc fatal : Unsupported gpu architecture 'compute_86'
...

According to the log, it is indeed due to the CUDA version problem. It seems nvcc is not compatible with your GPU. You may try other version of CUDA. Thanks 😄

2718564960 commented 1 year ago

yes, I have the same error. And I installed cuda -c nvidia/label/cuda-11.7.0. It seems ok now.

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0
Yue-stat commented 1 year ago

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

I am using module load gcc/9.2.0 cuda/11.7

But still getting the error ImportError: /home/xxxxxx/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory

yana-xuyan commented 1 year ago

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

This solution works for me! Thank you very much for the help! <3

shizhediao commented 1 year ago

This issue has been marked as stale because it has not had recent activity. If you think this still needs to be addressed please feel free to reopen this issue. Thanks!