Closed whu-lee closed 2 years ago
@whu-lee could you please explain more? What is causing this in the code? what are the potential suspects?
I'm sorry, I do not have a clear description of the problem.
In the second cycle of training, the program will be interrupted inexplicably, and the terminal displays "killed". I check the memory and it shows that the memory has been completely used, and I suspect it is the cause of the memory leak.
It may be caused by the calculation of loss during training. I changed "total_loss += loss" to "total_loss += loss.item()".There is no memory leak at present.
By the way , I add "torch.cuda.empty_cache()" for the problem of excessive memory usage of cuda.
@whu-lee I updated the code as per your recommendations. Please let me know if the problem persists.
"killed" will appear in second epoch of code training.