Open danielchalef opened 9 months ago
Did you try upgrading or downgrading bitsandbytes?
On Thu, Feb 1, 2024 at 3:11 PM Daniel Chalef @.***> wrote:
Please check that this issue hasn't been reported before.
- I searched previous Bug Reports https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug didn't find any similar reports.
Expected Behavior
A full finetune job can use both DeepSpeed and a BNB optimizer such as adamw_bnb_8bit Current behaviour
Job fails with:
(XXXX, pid=118995) Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu (XXXX, pid=118995) [2024-02-01 19:59:29,059] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 76) of binary: /root/miniconda3/envs/py3.10/bin/python3 (XXXX, pid=118995) Traceback (most recent call last): (XXXX, pid=118995) File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in
``` Steps to reproduce
Container published today (have tried with several published over the last few weeks, both cu121 and cu118): @.***:ec8c30be1c29b11188f0b52df12125d60f511e5d7ef487796974f8fd6d37b8f0`
Machine: 2x A100 80GB Base Model:
mistralai/Mistral-7B-v0.1
Configure axolotl job to use a quantized optimizer:adamw_bnb_8bit
Using Deepspeedzero1.json
from this repo.Execute job:
sudo docker run --gpus all -v ~/XXXX:/XXXXX -v /root/.cache:/root/.cache winglian/axolotl:main-py3.10-cu121-2.1.2 accelerate launch -m axolotl.cli.train /XXXX/XXXXXX.yml``` Config yaml
base_model: mistralai/Mistral-7B-v0.1 model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true
load_in_8bit: false load_in_4bit: false strict: false
datasets:
- path: /XXXXXX/train.jsonl type: alpaca
dataset_prepared_path: val_set_size: 0.05 output_dir: /XXXXX/models/
hub_model_id: XXXXXX hub_strategy: end hf_use_auth_token: true
sequence_len: 1024 sample_packing: true pad_to_sequence_len: true eval_sample_packing: false
wandb_project: XXXXXXX wandb_entity: wandb_watch: wandb_name: wandb_log_model:
gradient_accumulation_steps: 4 micro_batch_size: 8 num_epochs: 4 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005
train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false
gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true
warmup_steps: 10 evals_per_epoch: 4 eval_table_size: eval_table_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: /XXXXXX/zero1.json weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: "" Possible solution
No response Which Operating Systems are you using?
- Linux
- macOS
- Windows
Python Version
3.10 axolotl branch-commit
main <- whatever was used to generate the referenced container Acknowledgements
- My issue title is concise, descriptive, and in title casing.
- I have searched the existing issues to make sure this bug has not been reported yet.
- I am using the latest version of axolotl.
- I have provided enough information for the maintainers to reproduce and diagnose the issue.
— Reply to this email directly, view it on GitHub https://github.com/OpenAccess-AI-Collective/axolotl/issues/1244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5CSQCINH76WUC6Q6EWH3YRPZFLAVCNFSM6AAAAABCVQWHD6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTGMZYGI2DCMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@winglian Yes, I've tried bitsandbytes versions:
winglian/axolotl:main-py3.10-*
same issue here.
Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
got a similar issue while doing full-finetuning with deepspeed
Error invalid configuration argument at line 117 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
Same issue as well with a 7b model:
base_model: <snip>-7b
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
datasets:
<snip>
val_set_size: 0.05
output_dir: ./out
dataset_prepared_path: last_run_prepared
load_in_8bit: false
load_in_4bit: false
strict: false
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
flash_attn_cross_entropy: false
flash_attn_rms_norm: true
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: true
warmup_steps: 100
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: zero3.json
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
Deepspeed:
{
"zero_optimization": {
"stage": 3,
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 0,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 0,
"stage3_max_reuse_distance": 0,
"stage3_gather_16bit_weights_on_model_save": true
},
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false,
"auto_cast": false,
"loss_scale": 0,
"initial_scale_power": 32,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"gradient_accumulation_steps": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
Happens with a bitsandbytes source build as well at hash 136721a8c1437042f0491972ddc5f35695e5e9b2
same issue here with 2xH100 GPUs: Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
same issue here.
Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
same error, have you fix it?
same issue in A800: Error invalid configuration argument at line 216 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu
Please check that this issue hasn't been reported before.
Expected Behavior
A full finetune job can use both DeepSpeed and a BNB optimizer such as
adamw_bnb_8bit
Current behaviour
Job fails with:
Steps to reproduce
Container published today (have tried with several published over the last few weeks, both cu121 and cu118):
winglian/axolotl:main-py3.10-cu121-2.1.2@sha256:ec8c30be1c29b11188f0b52df12125d60f511e5d7ef487796974f8fd6d37b8f0
Machine: 2x A100 80GB Base Model:
mistralai/Mistral-7B-v0.1
Configure axolotl job to use a quantized optimizer:adamw_bnb_8bit
Using Deepspeedzero1.json
from this repo.Execute job:
Config yaml
base_model: mistralai/Mistral-7B-v0.1 model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true
load_in_8bit: false load_in_4bit: false strict: false
datasets:
dataset_prepared_path: val_set_size: 0.05 output_dir: /XXXXX/models/
hub_model_id: XXXXXX hub_strategy: end hf_use_auth_token: true
sequence_len: 1024 sample_packing: true pad_to_sequence_len: true eval_sample_packing: false
wandb_project: XXXXXXX wandb_entity: wandb_watch: wandb_name: wandb_log_model:
gradient_accumulation_steps: 4 micro_batch_size: 8 num_epochs: 4 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005
train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false
gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true
warmup_steps: 10 evals_per_epoch: 4 eval_table_size: eval_table_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: /XXXXXX/zero1.json weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: ""
" eos_token: "" unk_token: "Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main <- whatever was used to generate the referenced container
Acknowledgements