Open James-Shared-Studios opened 1 month ago
Try to repeat the test but show the CUDA graph which shows CUDA utilization.
To do that, click here:
And select CUDA
for CUDA it's barely reached 70% utilization
How does it compare with a bigger model?
u should compare speed
utilization matters less
u should compare speed
utilization matters less
The average processing time with GeForce 4070 is 0.16 seconds, compared to 0.51 seconds with RTX 4000 Ada. I would expect faster performance from RTX 4000 Ada, that's why I was wondering if the RTX 4000 Ada has been limited in some way.
How does it compare with a bigger model?
the same results for large-v1, large-v2 and large-v3
I would expect faster performance from RTX 4000 Ada
no u should expect the inverse: 4070 is faster
I would expect faster performance from RTX 4000 Ada
no u should expect the inverse: 4070 is faster
why is that? could you provide more context please? Thank you.
since the model can fit to gpu, vram is not a factor, it comes down to memory bandwidth (more impactful when cuda cores count isnt much different)
u can take a look at their theoretical fp32 & fp16 performance:
4070 FP16 (half) 29.15 TFLOPS vs RTX 4000 Ada FP16 (half) 26.73 TFLOPS (1:1) so RTX 4000 Ada should not be three times slower than 4070, correct?
the execution time is too short, there's additionally i/o overhead
for better benchmark, use longer audio/video to reduce overhead time part
the execution time is too short, there's additionally i/o overhead
for better benchmark, use longer audio/video to reduce overhead time part
That makes sense. I will try a longer audio and see if it improves the results. Thank you so much for your help.
What's the conclusion? @James-Shared-Studios
I am experiencing limited GPU utilization with the NVIDIA RTX 4000 Ada Gen card while running on Windows 10 1809 CPU: AMD EPYC 3251 8-Core Processor 2.5 GHz RAM: 32GB GPU: NVIDIA RTX 4000 Ada Gen 20 GB CUDA Toolkit Version: 12.3 GPU Driver Version: 546.12
Python code:
While running my code, I'm only observing around 10% GPU utilization.
However, the same code achieves 100% utilization on an NVIDIA GeForce RTX 4070.