Closed zhangxiann closed 6 months ago
经指点,需要下载最新的模型文件
下载最新的模型文件后再进行sft,报如下错误
[2024-01-02 15:57:58,042] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:57:59,681] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-01-02 15:57:59,681] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-01-02 15:57:59,681] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-01-02 15:57:59,681] [INFO] [launch.py:163:main] dist_world_size=8
[2024-01-02 15:57:59,681] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-01-02 15:58:01,434] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,455] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,491] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,501] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,513] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,525] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,542] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,548] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-01-02 15:58:04,583] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-01-02 15:58:05,357] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,357] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,362] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-01-02 15:58:05,393] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,393] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-02 15:58:05,393] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,395] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,401] [INFO] [comm.py:637:init_distributed] cdb=None
tokenizer path existtokenizer path existtokenizer path exist
tokenizer path exist
tokenizer path existtokenizer path exist
tokenizer path exist
tokenizer path exist
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
config = self._autoset_attn_implementation(model = create_hf_model(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = cls(config, *model_args, **model_kwargs)config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
Traceback (most recent call last):
return model_class.from_pretrained( File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
model = create_hf_model(
File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
model = model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
return model_class.from_pretrained(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
super().__init__(config)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
config = self._autoset_attn_implementation(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
[2024-01-02 15:58:14,702] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59071
[2024-01-02 15:58:14,719] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59072
[2024-01-02 15:58:14,777] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59073
[2024-01-02 15:58:14,784] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59074
[2024-01-02 15:58:14,790] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59075
[2024-01-02 15:58:14,791] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59076
[2024-01-02 15:58:14,797] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59077
[2024-01-02 15:58:14,803] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59078
[2024-01-02 15:58:14,810] [ERROR] [launch.py:321:sigkill_handler] ['/data/xxxx/conda/miniconda/envs/llm_yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '../yi_example_dataset/', '--model_name_or_path', '/xxxx/Yi/Yi-6B', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', './finetuned_model'] exits with return code = 1
Similar to this issue. Check the solution provided there might be helpful
Similar to this issue. Check the solution provided there might be helpful
是因为Yi的 sft 代码里少了这两行参数:
加上之后,再次运行,报新的错如下:
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
optimizer = AdamOptimizer(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
self.ds_opt_adam = CPUAdamBuilder().load()
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
return self.jit_load(verbose)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
op_module = load(name=self.name,
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1176, in create_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
optimizer = AdamOptimizer(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
self.ds_opt_adam = CPUAdamBuilder().load()
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
return self.jit_load(verbose)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
op_module = load(name=self.name,
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
Loading extension module cpu_adam...
return _jit_compile(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
Traceback (most recent call last):
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
return _import_module_from_library(name, build_directory, is_python_module)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
main()
File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
optimizer = AdamOptimizer(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
self.ds_opt_adam = CPUAdamBuilder().load()
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
module = importlib.util.module_from_spec(spec)
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1176, in create_module
return self.jit_load(verbose)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
op_module = load(name=self.name,
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1176, in create_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f9aa027f520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc984fc7520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fd85d71b520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7efcd27c7520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fb0b7ed7520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7ff36df6b520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7f1a357520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f834245f520>
Traceback (most recent call last):
File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Similar to this issue. Check the solution provided there might be helpful
是因为Yi的 sft 代码里少了这两行参数:
加上之后,再次运行,报新的错如下:
Loading extension module cpu_adam... Traceback (most recent call last): File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module> main() File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main optimizer = AdamOptimizer( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__ self.ds_opt_adam = CPUAdamBuilder().load() File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load return self.jit_load(verbose) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load op_module = load(name=self.name, File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "<frozen importlib._bootstrap>", line 571, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1176, in create_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory Loading extension module cpu_adam... Traceback (most recent call last): File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module> main() File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main optimizer = AdamOptimizer( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__ self.ds_opt_adam = CPUAdamBuilder().load() File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load return self.jit_load(verbose) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load op_module = load(name=self.name, File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load Loading extension module cpu_adam... return _jit_compile( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile Traceback (most recent call last): File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module> return _import_module_from_library(name, build_directory, is_python_module) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library main() File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main optimizer = AdamOptimizer( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__ self.ds_opt_adam = CPUAdamBuilder().load() File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load module = importlib.util.module_from_spec(spec) File "<frozen importlib._bootstrap>", line 571, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1176, in create_module return self.jit_load(verbose) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory op_module = load(name=self.name, File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "<frozen importlib._bootstrap>", line 571, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1176, in create_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f9aa027f520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc984fc7520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fd85d71b520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7efcd27c7520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fb0b7ed7520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7ff36df6b520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7f1a357520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f834245f520> Traceback (most recent call last): File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__ self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
I saw a similar issue in the official deepspeed repo, and it is probably due to the cuda-toolkit version. Hope you will find this helpful. microsoft/DeepSpeed#1846
我也遇到了类似的问题,请问你解决了吗 @zhangxiann
我也遇到了类似的问题,请问你解决了吗 @zhangxiann
没有解决,我先调研别的模型去了
不要用flash-attn2.0或以上,安装flash-attn==1.0.4
根据README运行sft脚本:
报错信息
环境
GPU A100 * 4
config.json
python 库