3X slower training on 3090

I tried following your Ooba Lora training settings and using the same unreal_docs.txt and the estimated time is 24 hours on RTX 3090 FE!

This is very different from your 8 hours. Yours has

Running… 81 / 21888 … 1.34 s/it, 108 seconds /8 hours … 8 hours remaining

while mine has

Running… 784 / 30720 … 2.70 s/it, 35 minutes / 23 hours … 22 hours remaining

I am using WizardLM 13B as the base model, with 8-bit mode enabled.

Do you think I might have used the wrong training settings somewhere

bublint / ue5-llama-lora