Closed Likkkez closed 1 year ago
might be because your cpu cores are not many enough. decrease num_workers
in dataloader may solve this.
Hmm I doubt that's the case, I have 5950x with 16 cores
Nvm I think that might be because my drive got messed up while it was training
I noticed that every now and then I get a Segmentation Fault message in the terminal. However if i open the train log it's not there and there's no traceback message or anything. Is there any way to figure out what might be causing this?
It usually happens right before the Epoch is finished and then the training continues as if nothing happened.