Open Nero10578 opened 3 months ago
u need load_in_8bit: true
with lora adapter i think
You'll need to use 4 it qlora for this if you intend to use fsdp. Iirc 8 bit quantization doesn't play well with fsdp.
You'll need to use 4 it qlora for this if you intend to use fsdp. Iirc 8 bit quantization doesn't play well with fsdp.
Does that mean using LORA while setting load_in_8bit: false does not work with FSDP? Using LORA loads in 8-bit by default right?
@Nero10578 can we mix continued pretrain and fine tune at the same time ? as your dataset indicated :
Please check that this issue hasn't been reported before.
Expected Behavior
Eval or no eval shouldn't cause a difference in memory use and cause the training to fail when eval is enabled.
Current behaviour
I can start and train Mistral Nemo 12B just fine but it crashes when returning to training after an eval. If I disable eval entirely then the training works just fine.
This is the result of training Mistral Nemo 12B Instruct when enabling eval. This error comes up after going back into training after finishing the first eval,
Steps to reproduce
Train mistral 12B instruct using FSDP and LORA with 8192 context. Then enable evals and it will fail after the first eval.
I am training Mistral Nemo 12B Instruct with the tokenizersreplaced with the tokenizer from https://huggingface.co/axolotl-ai-co/Mistral-Nemo-Base-2407-chatml so that there are chatml tokens for training.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.11
axolotl branch-commit
78b42a3fe13c49e317bc116b9999c30e070322cc
Acknowledgements