I was inspired by some performance helpdesk discussion over the weekend (xref https://github.com/JuliaLang/julia/issues/55009), and made a very simple reproducer for our matrix-field getidx. I recently had a hunch that we should hoist our eltype calls, but @dennisYatunin pointed out that this should simply return a compile-time constant. The benchmark shows, however, that we're spending most of the time in eltype:
Does this mean that the compiler is not caching the inference result? @vchuravy
The good news is that: 1) there is clearly room for improving our emitted code, and 2) this is a really nice way to make reproducers that captures the fully complexity of getidx while only targeting a single point in space, and 3) this impacts both the CPU and the GPU, since these instructions could easily increase register usage on the gpu.
I was inspired by some performance helpdesk discussion over the weekend (xref https://github.com/JuliaLang/julia/issues/55009), and made a very simple reproducer for our matrix-field
getidx
. I recently had a hunch that we should hoist oureltype
calls, but @dennisYatunin pointed out that this should simply return a compile-time constant. The benchmark shows, however, that we're spending most of the time ineltype
:Does this mean that the compiler is not caching the inference result? @vchuravy
The good news is that: 1) there is clearly room for improving our emitted code, and 2) this is a really nice way to make reproducers that captures the fully complexity of
getidx
while only targeting a single point in space, and 3) this impacts both the CPU and the GPU, since these instructions could easily increase register usage on the gpu.