Reminder

[X] I have searched the Github Discussion and issues and have not found anything similar to this.

Environment

- OS:
- Python:3.10
- PyTorch:2.0.1+cu117
- CUDA:11.7

Current Behavior

在官方的数据集上微调不会出现报错，但是在自己的数据集上会出现报错，报错具体信息在下面

Expected Behavior

No response

Steps to Reproduce

在我自己构建的数据集上进行微调会出现以下报错： , '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:02,475] [INFO] [launch.py:256:main] process 2103326 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=3', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:02,476] [INFO] [launch.py:256:main] process 2103327 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=4', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:02,477] [INFO] [launch.py:256:main] process 2103328 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=5', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:02,478] [INFO] [launch.py:256:main] process 2103329 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=6', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:02,479] [INFO] [launch.py:256:main] process 2103330 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] [2024-06-05 16:42:04,420] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-05 16:42:04,492] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [2024-06-05 16:42:04,508] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [2024-06-05 16:42:04,578] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-05 16:42:04,578] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-05 16:42:04,584] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-05 16:42:04,593] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [2024-06-05 16:42:04,672] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,483] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,508] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,524] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,588] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,598] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:06,612] [INFO] [comm.py:637:init_distributed] cdb=None /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:07,062] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-05 16:42:07,062] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl /root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-06-05 16:42:07,114] [INFO] [comm.py:637:init_distributed] cdb=None tokenizer path existtokenizer path existtokenizer path exist

tokenizer path exist tokenizer path exist tokenizer path existtokenizer path existtokenizer path exist

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Loading checkpoint shards: 100%|████████████████| 2/2 [00:10<00:00, 5.05s/it] Loading checkpoint shards: 50%|████████ | 1/2 [00:10<00:10, 10.15s/it]length of tokenizer is 64000 resize_token_embeddings is 64000 Loading checkpoint shards: 100%|████████████████| 2/2 [00:10<00:00, 5.49s/it] length of tokenizer is 64000 Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.73s/it] Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.74s/it] resize_token_embeddings is 64000 Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.70s/it] Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.72s/it] Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.73s/it] Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.70s/it] length of tokenizer is 64000 length of tokenizer is 64000 length of tokenizer is 64000 length of tokenizer is 64000 resize_token_embeddings is 64000 resize_token_embeddings is 64000 length of tokenizer is 64000 length of tokenizer is 64000 resize_token_embeddings is 64000 resize_token_embeddings is 64000 resize_token_embeddings is 64000 resize_token_embeddings is 64000 Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Loading extension module cpu_adam... Time to load cpu_adam op: 2.536935567855835 seconds Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 2.6319587230682373 seconds Loading extension module cpu_adam... Time to load cpu_adam op: 2.6258392333984375 seconds Loading extension module cpu_adam... Time to load cpu_adam op: 2.648719310760498 seconds Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 2.706559419631958 seconds Loading extension module cpu_adam... Loading extension module cpu_adam... Time to load cpu_adam op: 2.735806703567505 seconds Time to load cpu_adam op: 2.735208511352539 seconds Loading extension module cpu_adam... Time to load cpu_adam op: 2.7772958278656006 seconds Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 [2024-06-05 16:42:26,674] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.2, git-hash=unknown, git-branch=unknown [2024-06-05 16:42:26,674] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1 [2024-06-05 16:42:50,655] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2024-06-05 16:42:50,656] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2024-06-05 16:42:50,656] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-06-05 16:42:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2024-06-05 16:42:50,665] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'> [2024-06-05 16:42:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:148:init] Reduce bucket size 500,000,000 [2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:149:init] Allgather bucket size 500,000,000 [2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:150:init] CPU Offload: True [2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:151:init] Round robin gradient partitioning: False Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment [2024-06-05 16:43:20,446] [INFO] [utils.py:779:see_memory_usage] Before initializing optimizer states [2024-06-05 16:43:20,447] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB [2024-06-05 16:43:20,447] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 119.32 GB, percent = 15.8% [2024-06-05 16:43:20,712] [INFO] [utils.py:779:see_memory_usage] After initializing optimizer states [2024-06-05 16:43:20,712] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB [2024-06-05 16:43:20,713] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 121.65 GB, percent = 16.1% [2024-06-05 16:43:20,713] [INFO] [stage_1_and_2.py:543:init] optimizer state initialized Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment [2024-06-05 16:43:20,823] [INFO] [utils.py:779:see_memory_usage] After initializing ZeRO optimizer [2024-06-05 16:43:20,824] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB [2024-06-05 16:43:20,824] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 122.85 GB, percent = 16.3% [2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam [2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f99a3130df0> [2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-06], mom=[(0.9, 0.95)] [2024-06-05 16:43:20,827] [INFO] [config.py:996:print] DeepSpeedEngine configuration: [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] amp_enabled .................. False [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] amp_params ................... False [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] bfloat16_enabled ............. False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f99a3131c30> [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dump_state ................... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_auto_cast ............... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_enabled ................. True [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] loss_scale ................... 0 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='sft_tensorboard/ds_tensorboard_logs/', job_name='sft_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] pld_params ................... False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] steps_per_print .............. 10 [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] train_batch_size ............. 8 [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 1 [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] use_data_before_expertparallel False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] world_size ................... 8 [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_allow_untested_optimizer False [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='cpu', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 [2024-06-05 16:43:20,829] [INFO] [config.py:986:print_user_config] json = { "train_batch_size": 8, "train_micro_batch_size_per_gpu": 1, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "cpu" }, "offload_optimizer": { "device": "cpu" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 }, "tensorboard": { "enabled": false, "output_path": "sft_tensorboard/ds_tensorboard_logs/", "job_name": "sft_tensorboard" } } Running training Evaluating perplexity, Epoch 0/4 Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment [2024-06-05 16:43:21,571] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103323 [2024-06-05 16:43:25,215] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103324 [2024-06-05 16:43:25,216] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103325 [2024-06-05 16:43:25,242] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103326 [2024-06-05 16:43:26,191] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103327 [2024-06-05 16:43:26,215] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103328 [2024-06-05 16:43:26,228] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103329 [2024-06-05 16:43:26,240] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103330 [2024-06-05 16:43:26,251] [ERROR] [launch.py:325:sigkill_handler] ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] exits with return code = 1 运行的脚本是：

/usr/bin/env bash

cd "$(dirname "${BASH_SOURCE[0]}")/../sft/"

deepspeed main.py \ --data_path /root/vision/Yi-main/Yi-main/finetune/yi_dataset \ --model_name_or_path /root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --max_seq_len 4096 \ --learning_rate 2e-6 \ --weight_decay 0. \ --num_train_epochs 4 \ --training_debug_steps 20 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage 2 \ --deepspeed \ --offload \ --output_dir /root/vision/Yi-main/Yi-main/finetuned_model 但是我把数据集换成官方的yi_example_dataset就可以成功微调，但是在自己的数据集上就会出现这个问题：Traceback (most recent call last): File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in main() File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main perplexity = evaluation(model, eval_dataloader) File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation losses = losses / (step + 1) UnboundLocalError: local variable 'step' referenced before assignment 请问这是为什么？

Anything Else?