Open velourlawsuits opened 1 week ago
Not knowing much about the code... But since the message is "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)" it might be helpful to provide your configurations in that context.
Maybe switch to a much smaller dataset and try to find out under which conditions it happens:
Do you work for Donald Glover's company lol
Activating the auto-update and/or -r requirements fixed the issue.
What happened?
I have been training sdxl finetune models on OneTrainer for the past two months with great success until last night when my training session abruptly aborted at 10% on the second epoch. I restarted my computer, deleted the previous saved epoch, the backup I made and the sample images, and ran the training session again. The program once again aborted at 10% on epoch 2. I have since run the auto-update and pip install -r requirements.txt and I'm running it again. Hopefully it won't crash but I won't be around to debug so I'm opening this issue in advance.
What did you expect would happen?
I have # of epochs set to 10 so was expecting it to complete the training.
Relevant log output
Output of
pip freeze
No response