Closed dpatschke closed 3 months ago
@dpatschke I ran a quick test using this script and I observe a linear increase in the memory used with num_samples
.
import torch
from chronos import ChronosPipeline
def used_gpu_memory(device):
free, total = torch.cuda.mem_get_info(device=device)
return (total - free) / 1024 / 1024
device = 1
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-large",
device_map=device,
torch_dtype=torch.bfloat16,
)
print(f"Used memory: {used_gpu_memory(device):.0f} MB")
context = torch.randn(4000)
forecast = pipeline.predict(context, prediction_length=64, num_samples=100)
print("forecast.shape", forecast.shape)
print(f"Used memory: {used_gpu_memory(device):.0f} MB")
forecast = pipeline.predict(context, prediction_length=64, num_samples=200)
print("forecast.shape", forecast.shape)
print(f"Used memory: {used_gpu_memory(device):.0f} MB")
Output:
Used memory: 1692 MB
forecast.shape torch.Size([1, 100, 64])
Used memory: 8344 MB
forecast.shape torch.Size([1, 200, 64])
Used memory: 14556 MB
(14556 - 1692) / (8344 - 1692) = 1.93
@abdulfatir Thanks so much for your reply!
I'll run some similar tests on my end and go ahead and close out the issue.
Thank you for your contributions with this library!
Quick question on GPU memory usage. I haven't examined the underlying library code in-depth, but I'm noticing a more-than-linear increase in GPU memory usage with the number of samples that are requested.
I'm seeing the large model at bfloat16 taking up about 1.5GB in GPU memory which is what I was expecting based on the T5 documentation. With a 4000 element time series and num_samples=100, I'm seeing my GPU memory usage increase to 7.5GB. Doubling the num_samples to 200, increases the memory usage to over 17GB.
Just curious if you might be able to share more information surrounding GPU memory usage and any best practices on managing.
Thanks!