Describe the bug
Every time loss is calculated, there remains an empty tensor in memory that doesn't get cleaned up. This eventually causes an 'out of memory' error in some instances.
To Reproduce
Insert this function to a loss calculation call:
def memReport(self):
for obj in gc.get_objects():
if torch.is_tensor(obj):
if len(obj.size()) == 0:
print(type(obj), obj.size(), id(obj))
and you will see that there is one more tensor each iteration.
Desktop (please complete the following information):
OS: Ubuntu
OS version 16.04
Python version 2.7 and 3.6 (both tested)
PyTorch version 0.4.1
Additional context
There seems to be similar issues reported online when loss is used instead of loss.item() when logging values. Can this be our problem?
Describe the bug Every time loss is calculated, there remains an empty tensor in memory that doesn't get cleaned up. This eventually causes an 'out of memory' error in some instances.
To Reproduce Insert this function to a loss calculation call: def memReport(self): for obj in gc.get_objects(): if torch.is_tensor(obj): if len(obj.size()) == 0: print(type(obj), obj.size(), id(obj)) and you will see that there is one more tensor each iteration.
Here's an example (from simplecnn_mnist where the function sits in evaluate_loss of problem.py): New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666368017464 <class 'torch.Tensor'> torch.Size([]) 140666368017824 New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666366779328 <class 'torch.Tensor'> torch.Size([]) 140666368027168 <class 'torch.Tensor'> torch.Size([]) 140666368017824 New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666368027168 <class 'torch.Tensor'> torch.Size([]) 140666366779328 <class 'torch.Tensor'> torch.Size([]) 140666368017896 <class 'torch.Tensor'> torch.Size([]) 140666368017824 New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666368027168 <class 'torch.Tensor'> torch.Size([]) 140666368017896 <class 'torch.Tensor'> torch.Size([]) 140666366779328 <class 'torch.Tensor'> torch.Size([]) 140666368024296 <class 'torch.Tensor'> torch.Size([]) 140666368017824 New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666368027168 <class 'torch.Tensor'> torch.Size([]) 140666368017896 <class 'torch.Tensor'> torch.Size([]) 140666368024296 <class 'torch.Tensor'> torch.Size([]) 140666366779328 <class 'torch.Tensor'> torch.Size([]) 140666366834152 <class 'torch.Tensor'> torch.Size([]) 140666368017824 New cycle <class 'torch.Tensor'> torch.Size([]) 140667482493360 <class 'torch.Tensor'> torch.Size([]) 140666368027168 <class 'torch.Tensor'> torch.Size([]) 140666368017896 <class 'torch.Tensor'> torch.Size([]) 140666368024296 <class 'torch.Tensor'> torch.Size([]) 140666366834152 <class 'torch.Tensor'> torch.Size([]) 140666366779328 <class 'torch.Tensor'> torch.Size([]) 140666366834296 <class 'torch.Tensor'> torch.Size([]) 140666368017824
Desktop (please complete the following information):
Additional context There seems to be similar issues reported online when loss is used instead of loss.item() when logging values. Can this be our problem?