Closed jacobhinkle closed 1 day ago
We should determine the root cause of the slowdown, but it might also be a good idea to bypass the compile time checks in the kernel re-use case since we know they must all have passed previously if we already accepted these segments.
I disabled compile time checks in getMaybeHeuristicsFor
and now I see this:
It appears that our dynamic shape latency is about 10x slower than expected. These are the times for our three existing benchmarks: I verified on my local machine that this is not a measurement error. For example on my machine which is less powerful than the one used to generate that graph, I see the following nsys trace for the adaptive layernorm test:
We need to add some instrumentation and determine the cause of the slowdown. Since all three of these tests are slow and they use different schedulers, I suspect it might be a general scheduling utility that got slower.