Open yaohwang opened 7 months ago
try changing these settings
micro_batch_size: 1
optimizer: paged_adamw_8bit
try changing these settings
micro_batch_size: 1 optimizer: paged_adamw_8bit
thanks for your help, but still get the same error
`base_model: NousResearch/Llama-2-70b-chat-hf model_type: LlamaForCausalLM tokenizer_type: LlamaTokenizer
load_in_8bit: false load_in_4bit: true strict: false
datasets:
adapter: qlora lora_model_dir:
sequence_len: 512 sample_packing: false pad_to_sequence_len: true
lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out:
wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model:
gradient_accumulation_steps: 4 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00001
train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true
gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true
warmup_steps: 10 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp:
how much CPU memory do you have? Keep I mind that offloading 70B llama-2 requires 128GB of system/CPU RAM.
how much CPU memory do you have? Keep I mind that offloading 70B llama-2 requires 128GB of system/CPU RAM.
yeah, that's it, 128GB RAM and 2x 24G RTX3090, and I've been tested 70b llama2 on https://github.com/AnswerDotAI/fsdp_qlora before I use axolotl, with the same environment, it worked.
so I'm expecting axolotl having FSDP+QLORA get the same thing work.
and thanks man, you are doing great job!
@yaohwang @winglian any updates?
Please check that this issue hasn't been reported before.
Expected Behavior
expecting no OOM.
with #1494 fixed, I've been tested that 7b Llama works right now with FSDP+QLora on axolotl.
but FSDP+QLora of Answer.AI worked with 70b Llama which I've been tested (with the same 2*RTX3090), so I'm expecting this work with axolotl's FSDP+QLora too.
Current behaviour
Traceback (most recent call last): File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main Traceback (most recent call last): File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, *kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 87, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/workspace/axolotl/src/axolotl/utils/models.py", line 799, in load_model
model.to(f"cuda:{cfg.local_rank}")
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
component = fn(varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 87, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/workspace/axolotl/src/axolotl/utils/models.py", line 799, in load_model
model.to(f"cuda:{cfg.local_rank}")
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 318, in to
return self._apply(convert)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 318, in to
new_param = Params4bit(super().to(device=device, dtype=dtype, non_blocking=non_blocking),
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 1 has a total capacty of 23.69 GiB of which 6.94 MiB is free. Process 156255 has 23.68 GiB memory in use. Of the allocated memory 22.54 GiB is allocated by PyTorch, and 20.72 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
new_param = Params4bit(super().to(device=device, dtype=dtype, non_blocking=non_blocking),
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 49.00 MiB is free. Process 156254 has 23.63 GiB memory in use. Of the allocated memory 22.43 GiB is allocated by PyTorch, and 67.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps to reproduce
accelerate launch -m axolotl.cli.train examples/llama-2/qlora-fsdp.yml
with examples/llama-2/qlora-fsdp.yml change to base_model: NousResearch/Llama-2-70b-chat-hf and batch size 1.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main/4d6490b
Acknowledgements