axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.67k stars 843 forks source link

Error invalid configuration argument at line 119 in file /src/csrc/ops.cu #1527

Open jaywongs opened 5 months ago

jaywongs commented 5 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

the train task should be start pertfectly

Current behaviour

Error invalid configuration argument at line 119 in file /src/csrc/ops.cu


[2024-04-17 00:08:54,225] [INFO] [axolotl.load_model:354] [PID:808742] [RANK:2] patching with flash attention for sample packing
[2024-04-17 00:08:54,225] [INFO] [axolotl.load_model:354] [PID:808744] [RANK:4] patching with flash attention for sample packing
[2024-04-17 00:08:54,230] [INFO] [axolotl.load_model:354] [PID:808743] [RANK:3] patching with flash attention for sample packing
[2024-04-17 00:08:54,246] [INFO] [axolotl.scripts.load_datasets:415] [PID:808746] [RANK:6] printing prompters...
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808743] [RANK:3] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808742] [RANK:2] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808744] [RANK:4] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808741] [RANK:1] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808745] [RANK:5] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808748] [RANK:7] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808742] [RANK:2] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808743] [RANK:3] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808745] [RANK:5] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808744] [RANK:4] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808741] [RANK:1] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808748] [RANK:7] patching _expand_mask
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:277] [PID:808746] [RANK:6] EOS: 2 / </s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:278] [PID:808746] [RANK:6] BOS: 1 / <s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:279] [PID:808746] [RANK:6] PAD: 2 / </s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:280] [PID:808746] [RANK:6] UNK: 0 / <unk>
[2024-04-17 00:08:54,294] [INFO] [axolotl.load_model:354] [PID:808740] [RANK:0] patching with flash attention for sample packing
[2024-04-17 00:08:54,295] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808740] [RANK:0] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,295] [INFO] [axolotl.load_model:403] [PID:808740] [RANK:0] patching _expand_mask
[2024-04-17 00:08:54,342] [INFO] [axolotl.load_model:354] [PID:808746] [RANK:6] patching with flash attention for sample packing
[2024-04-17 00:08:54,343] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808746] [RANK:6] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,343] [INFO] [axolotl.load_model:403] [PID:808746] [RANK:6] patching _expand_mask
[2024-04-17 00:09:05,981] [INFO] [partition_parameters.py:349:__exit__] finished initializing model - num_params = 723, num_elems = 68.98B
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
[2024-04-17 00:09:38,492] [INFO] [axolotl.load_model:597] [PID:808743] [RANK:3] patching with SwiGLU
[2024-04-17 00:09:38,493] [INFO] [axolotl.load_model:597] [PID:808741] [RANK:1] patching with SwiGLU
[2024-04-17 00:09:38,495] [INFO] [axolotl.load_model:597] [PID:808744] [RANK:4] patching with SwiGLU
[2024-04-17 00:09:38,495] [INFO] [axolotl.load_model:597] [PID:808742] [RANK:2] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
[2024-04-17 00:09:38,511] [INFO] [axolotl.load_model:597] [PID:808745] [RANK:5] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
[2024-04-17 00:09:38,513] [INFO] [axolotl.load_model:597] [PID:808748] [RANK:7] patching with SwiGLU
[2024-04-17 00:09:38,518] [INFO] [axolotl.load_model:597] [PID:808746] [RANK:6] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00,  1.12s/it]
[2024-04-17 00:09:38,563] [INFO] [axolotl.load_model:597] [PID:808740] [RANK:0] patching with SwiGLU
[2024-04-17 00:14:54,032] [INFO] [axolotl.load_model:715] [PID:808741] [RANK:1] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:54,036] [INFO] [axolotl.load_model:775] [PID:808741] [RANK:1] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:54,466] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,538] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,615] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,836] [INFO] [axolotl.load_model:715] [PID:808748] [RANK:7] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.373GB misc)
[2024-04-17 00:14:54,841] [INFO] [axolotl.load_model:775] [PID:808748] [RANK:7] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:54,870] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:55,271] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,342] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,414] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,650] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,666] [INFO] [axolotl.load_model:715] [PID:808745] [RANK:5] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:55,670] [INFO] [axolotl.load_model:775] [PID:808745] [RANK:5] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:55,997] [INFO] [axolotl.load_model:715] [PID:808740] [RANK:0] GPU memory usage after model load: 0.625GB (+1.723GB cache, +3.498GB misc)
[2024-04-17 00:14:56,002] [INFO] [axolotl.load_model:775] [PID:808740] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:56,051] [INFO] [axolotl.load_model:715] [PID:808744] [RANK:4] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,055] [INFO] [axolotl.load_model:775] [PID:808744] [RANK:4] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,110] [WARNING] [accelerate.utils.other.log:61] [PID:808740] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[2024-04-17 00:14:56,122] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,143] [INFO] [axolotl.train.log:61] [PID:808740] [RANK:0] Starting trainer...
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,195] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,271] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,467] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,518] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,526] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,538] [INFO] [axolotl.load_model:715] [PID:808746] [RANK:6] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,542] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,543] [INFO] [axolotl.load_model:775] [PID:808746] [RANK:6] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:56,599] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,617] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,676] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,704] [INFO] [axolotl.load_model:715] [PID:808742] [RANK:2] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,709] [INFO] [axolotl.load_model:775] [PID:808742] [RANK:2] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,872] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,932] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,934] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-04-17 00:14:56,975] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,051] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,123] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,164] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,240] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,315] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Parameter Offload: Total persistent parameters: 1318912 in 321 params
[2024-04-17 00:14:57,363] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,570] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,305] [INFO] [axolotl.load_model:715] [PID:808743] [RANK:3] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:58,310] [INFO] [axolotl.load_model:775] [PID:808743] [RANK:3] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:58,743] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,817] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,890] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:59,135] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
 [2024-04-17 00:15:20,046] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 

Steps to reproduce

I trained the Codellama-70b model using 8 A100 80G GPUs. I performed a full fine-tune and used the following shell to start the training process:

accelerate launch -m axolotl.cli.train examples/code-llama/70b/fft_optimized.yml --debug

Config yaml

base_model: /mnt/models/CodeLlama-70b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: xxx
    type: 
      field_instruction: instruction
      field_output: response
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: /mnt/output

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
chat_template: chatml

adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00005

train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
flash_attn_cross_entropy: false
flash_attn_rms_norm: true
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: true

warmup_steps: 200
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json # multi-gpu only
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.11.5

axolotl branch-commit

main/132eb740f036eff0fa8b239ddaf0b7a359ed1732

Acknowledgements

Napuh commented 5 months ago

Try pip install -U deepspeed.

This solved a similar problem with mistral 7b

NanoCode012 commented 5 months ago

@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.

monk1337 commented 5 months ago

@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.

Yes, it solved for me!

jaywongs commented 5 months ago

@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.

Apologies for the delayed response. I have tried using the latest version of deepspeed, but the error persists.

NanoCode012 commented 5 months ago

@jaywongs , did upgrading deepspeed work for you?

jaywongs commented 5 months ago

@jaywongs , did upgrading deepspeed work for you?

not work for me,i use the deepspeed 0.14.2

MM-WW55 commented 1 month ago

@jaywongs , did upgrading deepspeed work for you?

not work for me,i use the deepspeed 0.14.2

Hello, have you solved it? I also encountered the same problem.

jaywongs commented 1 month ago

@jaywongs , did upgrading deepspeed work for you?

not work for me,i use the deepspeed 0.14.2

Hello, have you solved it? I also encountered the same problem.

Unfortunately, I was unable to solve it in the end.

likejazz commented 1 month ago

Same error here.

Error invalid configuration argument at line 218 in file /src/csrc/ops.cu

I used winglian/axolotl:main-latest docker image and my configurations is shown below:

**** Axolotl Dependency Versions *****
  accelerate: 0.33.0
        peft: 0.12.0
transformers: 4.44.0
         trl: 0.9.6
       torch: 2.3.1+cu121
bitsandbytes: 0.43.3
****************************************
deepspeed: 0.15.0