DeepSpeed: `Error invalid configuration argument at line 216 in file /<snip>/bitsandbytes/csrc/ops.cu`

danielchalef commented 9 months ago

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

A full finetune job can use both DeepSpeed and a BNB optimizer such as adamw_bnb_8bit

Current behaviour

Job fails with:

(XXXX, pid=118995) Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
(XXXX, pid=118995) [2024-02-01 19:59:29,059] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 76) of binary: /root/miniconda3/envs/py3.10/bin/python3
(XXXX, pid=118995) Traceback (most recent call last):
(XXXX, pid=118995)   File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>

Steps to reproduce

Container published today (have tried with several published over the last few weeks, both cu121 and cu118): winglian/axolotl:main-py3.10-cu121-2.1.2@sha256:ec8c30be1c29b11188f0b52df12125d60f511e5d7ef487796974f8fd6d37b8f0

Machine: 2x A100 80GB Base Model: mistralai/Mistral-7B-v0.1 Configure axolotl job to use a quantized optimizer: adamw_bnb_8bit Using Deepspeed zero1.json from this repo.

Execute job:

sudo docker run --gpus all \
    -v ~/XXXX:/XXXXX \
    -v /root/.cache:/root/.cache \
    winglian/axolotl:main-py3.10-cu121-2.1.2 \
    accelerate launch -m axolotl.cli.train /XXXX/XXXXXX.yml

Config yaml

base_model: mistralai/Mistral-7B-v0.1 model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true

load_in_8bit: false load_in_4bit: false strict: false

datasets:

path: /XXXXXX/train.jsonl type: alpaca

dataset_prepared_path: val_set_size: 0.05 output_dir: /XXXXX/models/

hub_model_id: XXXXXX hub_strategy: end hf_use_auth_token: true

sequence_len: 1024 sample_packing: true pad_to_sequence_len: true eval_sample_packing: false

wandb_project: XXXXXXX wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 8 num_epochs: 4 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 evals_per_epoch: 4 eval_table_size: eval_table_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: /XXXXXX/zero1.json weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "~~" eos_token: "~~" unk_token: ""

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main <- whatever was used to generate the referenced container

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

winglian commented 9 months ago

Did you try upgrading or downgrading bitsandbytes?

On Thu, Feb 1, 2024 at 3:11 PM Daniel Chalef @.***> wrote:

Please check that this issue hasn't been reported before.

I searched previous Bug Reports https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug didn't find any similar reports.

Expected Behavior

A full finetune job can use both DeepSpeed and a BNB optimizer such as adamw_bnb_8bit Current behaviour

Job fails with:

(XXXX, pid=118995) Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu (XXXX, pid=118995) [2024-02-01 19:59:29,059] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 76) of binary: /root/miniconda3/envs/py3.10/bin/python3 (XXXX, pid=118995) Traceback (most recent call last): (XXXX, pid=118995) File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in ```

Steps to reproduce

Container published today (have tried with several published over the last few weeks, both cu121 and cu118): @.***:ec8c30be1c29b11188f0b52df12125d60f511e5d7ef487796974f8fd6d37b8f0`

Machine: 2x A100 80GB Base Model: mistralai/Mistral-7B-v0.1 Configure axolotl job to use a quantized optimizer: adamw_bnb_8bit Using Deepspeed zero1.json from this repo.

Execute job:

sudo docker run --gpus all -v ~/XXXX:/XXXXX -v /root/.cache:/root/.cache winglian/axolotl:main-py3.10-cu121-2.1.2 accelerate launch -m axolotl.cli.train /XXXX/XXXXXX.yml``` Config yaml

base_model: mistralai/Mistral-7B-v0.1 model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true

load_in_8bit: false load_in_4bit: false strict: false

datasets:

path: /XXXXXX/train.jsonl type: alpaca

dataset_prepared_path: val_set_size: 0.05 output_dir: /XXXXX/models/

hub_model_id: XXXXXX hub_strategy: end hf_use_auth_token: true

sequence_len: 1024 sample_packing: true pad_to_sequence_len: true eval_sample_packing: false

wandb_project: XXXXXXX wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 8 num_epochs: 4 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 evals_per_epoch: 4 eval_table_size: eval_table_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: /XXXXXX/zero1.json weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: "" Possible solution

No response Which Operating Systems are you using?

Linux

macOS

Windows

Python Version

3.10 axolotl branch-commit

main <- whatever was used to generate the referenced container Acknowledgements

My issue title is concise, descriptive, and in title casing.

I have searched the existing issues to make sure this bug has not been reported yet.

I am using the latest version of axolotl.

I have provided enough information for the maintainers to reproduce and diagnose the issue.

— Reply to this email directly, view it on GitHub https://github.com/OpenAccess-AI-Collective/axolotl/issues/1244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5CSQCINH76WUC6Q6EWH3YRPZFLAVCNFSM6AAAAABCVQWHD6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTGMZYGI2DCMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

danielchalef commented 9 months ago

@winglian Yes, I've tried bitsandbytes versions:

main/3ba076d3fb3f1590a049f5715c446514cff3a8c0
0.42.0
0.41.0
and whatever version were used in recent builds of winglian/axolotl:main-py3.10-*

Xynonners commented 9 months ago

same issue here.

Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

xu3kev commented 9 months ago

got a similar issue while doing full-finetuning with deepspeed

Error invalid configuration argument at line 117 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

zero1zero commented 9 months ago

Same issue as well with a 7b model:

base_model: <snip>-7b
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

datasets:
  <snip>

val_set_size: 0.05
output_dir: ./out
dataset_prepared_path: last_run_prepared

load_in_8bit: false
load_in_4bit: false
strict: false

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
flash_attn_cross_entropy: false
flash_attn_rms_norm: true
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: true

warmup_steps: 100
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: zero3.json
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:

Deepspeed:

{
  "zero_optimization": {
    "stage": 3,
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 0,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 0,
    "stage3_max_reuse_distance": 0,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  "bf16": {
    "enabled": true
  },
  "fp16": {
    "enabled": false,
    "auto_cast": false,
    "loss_scale": 0,
    "initial_scale_power": 32,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "gradient_accumulation_steps": "auto",
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": false
}

zero1zero commented 9 months ago

Happens with a bitsandbytes source build as well at hash 136721a8c1437042f0491972ddc5f35695e5e9b2

peterhan91 commented 9 months ago

same issue here with 2xH100 GPUs: Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

EricShow commented 5 months ago

same issue here.

Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

same error, have you fix it?

zhang-yige commented 2 months ago

same issue in A800: Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

axolotl-ai-cloud / axolotl