Closed anti-machinee closed 2 years ago
hi @anti-machinee; may not be a simple answer; this may depend on the model and where it's running (CPU vs. Gaudi)
For us to provide a good answer here, we need your help with some additional info:
@greg-serochi Here is my information
hi @anti-machinee, is your final objective to be able to maximize your batch size on your model? This iresnet model is still working with BS=96, correct? In this case, it may not be able to compare Batch Size across different architectures.
Note: in a future release we'll be providing some additional APIs to be able to have better visibility into on card memory usage, at this time, it's best to slowly increase the batch size until you find the threshold of pass / fail OOM.
I run a model has 50M parameters and try with different batch size, one is 96 and other is 128. Server uses bs = 128 is crashed and work fine with bs = 96. My 1080 has 12GB memory could handle batch size 96 with same parameter, but a signle gaudi has 32 GB. Please support me, thank you @greg-serochi