Greeting, I have employed the identical configuration on your Google Colab notebook to perform fine-tuning on the alpaca dataset. However, I have observed a notable disparity in the training duration, which is considerably longer compared to your findings. The training process took approximately 1 hour to complete 70 steps, and the estimated time for 3 epochs is around 23 hours. I am eager to receive your insights regarding my situation. Thank you.
MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # we don't need 3 tbh
LEARNING_RATE = 3e-4 # the Karpathy constant
CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
Greeting, I have employed the identical configuration on your Google Colab notebook to perform fine-tuning on the alpaca dataset. However, I have observed a notable disparity in the training duration, which is considerably longer compared to your findings. The training process took approximately 1 hour to complete 70 steps, and the estimated time for 3 epochs is around 23 hours. I am eager to receive your insights regarding my situation. Thank you.