Closed aravindhank11 closed 7 months ago
Hi! In our profiled models, for a fixed GPU type, the knee point at each kernel, as observed by the roofline model at the nsight compute tool: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#roofline is very similar for all examined kernels, so we simply selected the average value. Also, most clearly compute- or memory- intensive kernels have arithmetic intensity far different than the knee point so classifying them is pretty clear. I hope that helps!
Thank you! This helps :)
I am trying to figure what is the appropriate value to set for
ai_threshold
. Each kernel has a different knee point and thus wanted to understand what must be set for the model?