Scheduled Lora defaults to Nan past 2~4 loras down in the queue

Hi, first I love your work and it's super amazing, been using it for 4 months.

I often schedule 4 ~ 6 loras (XL or 1.5 depending on the day, I don't mix them in the queue), and have them bake when I'm asleep. I often notice that the last 2 ~3 loras in the queue sometimes goes to nan and the loras are broken. I don't think this is a toml issue because I restart my PC and start a new instance of the training script and I load the toml associated with the nan lora and it comes out ok. Is there a value that's not reinitialized before moving onto the next item in the queue? I often train different learning rates so it can be a strong lr --> nan situation but I also had talks with others that called the queue "cursed" and I was wondering if you will take a look at it.

There's no error that's being spit out in the console so I don't have any insight on that. However I have the wandb turned on (I use cosine with restart with warmup) so I can observe that the warmup (5%) is working correctly and the lr goes up, then it suddenly crashes to nan.

Any insight would be amazing and thank you for your time.

derrian-distro / LoRA_Easy_Training_Scripts

Scheduled Lora defaults to Nan past 2~4 loras down in the queue #180