Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
5.97k stars 518 forks source link

changing `devices` to `fabric.world_size` in the pretrain code #371

Open LamOne1 opened 1 year ago

LamOne1 commented 1 year ago

Hello,

according to our discussion here, I think devices should be changed in the pretraininig code to fabric.world_size, since the batch size refers to the global batch size. devices in the code is equal to the value of GPUs in a single node. process_batch_size = batch_size // fabric.world_size

I believe the same thing goes for max_iters = 600000 # num_epochs * (epoch_size // micro_batch_size) // devices

awaelchli commented 1 year ago

Hi @LamOne1 The suggestion sounds good to me for process_batch_size = batch_size // fabric.world_size. The reason it was not done for Shakespeare is that multi-machine training is not really needed for this amount of data. Since the redpajama was based on the same script, it was carried over. In any case, using the world size would be correct in the general case.

For the max_iters, honestly I think it should be kept as "infinite" for practical reasons, but I'm fine with either if it doesn't complicate things.