About the learning rate

DachengLi1 / LongChat

Official repository for LongChat and LongEval

Apache License 2.0

504 stars 29 forks source link

About the learning rate #19

Open lucasjinreal opened 1 year ago

lucasjinreal commented 1 year ago

from the script provided, I think longchat is full sft rather than lora, but the equal batch size total is just 1 (batch_size gradient_accum num_gpus)

But vicuna original fschat training full params sft, using equal batch size of 128, why lr is different? Which one should be adopted if only have 2 80G ?

DachengLi1 commented 1 year ago

@lucasjinreal I think either is fine - you can go with the largest batch size your gpu support, either with or without gradient accumulation,