Closed khokao closed 1 year ago
It looks you are using quite old torch build with old CUDA. This may not be the root cause of the problem, but anyway I suggest you trying the recent torch versions. We were able to train yolo nas in different hardware & OS and didn't notice such issues.
Try looking at the dmesg
output - maybe you will find additional details why the process was Killed.
Suggestions:
💡 Your Question
Hello, thanks for this useful repository !
I've been trying to train YOLO-NAS on the COCO dataset, but the training process stops after running just a few epochs, with the
Killed
message appearing in the standard output.According to the log file, the GPU memory usage seems to fluctuate significantly. Could there possibly be a memory leak causing this issue?
For reference, I've been using the following command to train the model:
Versions