Open megatomik opened 5 days ago
can you show me the version of your accelerate
and transformers
packages?
transformers: 4.46.3 accelerate: 1.1.1
I find this error will raise when use latest acclerate, and you can solve it by installing lower version:pip install accelerate==0.26.1
I find this error will raise when use latest acclerate, and you can solve it by installing lower version:
pip install accelerate==0.26.1
Thanks, that solved it for now. Quick follow up question if you don't mind, why is the recommended learning rate in the standard finetune config in the readme so high compared to the ones in the paper? When the paper uses the same learning rate the batch size is 50x higher.
@megatomik , if use lora, since the learnable parameters are very little, the lr should be high. For full-finetuning, the lr is normal.
I'm trying to reproduce training using the configs in the readme as is and the toy dataset. So far I can train a LoRA (though the loss doesn't seem to be going anywhere even after 10 epochs, but that might be another issue). However launching a full finetune fails right before the epoch should start:
I have tried both multi and single GPU on several machines and with up to 32GB of VRAM per card.