I'm trying to use the FlopCountAnalysis class but I'm not able to free the GPU memory used.
As a minimal example, without the FlopCountAnalysis I can do something like:
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained('dccuchile/bert-base-spanish-wwm-uncased')
query = torch.randint(low=0, high=20, size=(8, 16))
print(torch.cuda.memory_allocated())
model.to("cuda:0")
query = query.to("cuda:0")
print(torch.cuda.memory_allocated())
del model
del query
print(torch.cuda.memory_allocated())
And that prints "0", "439937024", "0".
When using the FlopCountAnalysis:
from transformers import AutoModel
import torch
from fvcore.nn import FlopCountAnalysis
model = AutoModel.from_pretrained('dccuchile/bert-base-spanish-wwm-uncased')
query = torch.randint(low=0, high=20, size=(8, 16))
print(torch.cuda.memory_allocated())
model.to("cuda:0")
query = query.to("cuda:0")
print(torch.cuda.memory_allocated())
counter = FlopCountAnalysis(model, inputs=query)
total = counter.total()
print(torch.cuda.memory_allocated())
del model
del query
del counter
del total
print(torch.cuda.memory_allocated())
It shows "0", "439937024", "530033664", "530033664".
I expect the final memory allocated to be 0 again.
Hello, thank you so much for your work.
I'm trying to use the FlopCountAnalysis class but I'm not able to free the GPU memory used.
As a minimal example, without the FlopCountAnalysis I can do something like:
And that prints "0", "439937024", "0".
When using the FlopCountAnalysis:
It shows "0", "439937024", "530033664", "530033664". I expect the final memory allocated to be 0 again.
I also tried with:
at the end, but the result was the same.
Is there a proper way to free the memory?
Thank you.