Closed pdamme closed 2 months ago
Awesome @pdamme
Will test it right away :+1:
I've been playing around with various memory profilers to test the issue and the solution. Stack usage looks quite normal with your changes applied. Also the explain_llvm output shows the improvements nicely. Thx for fixing this long standing bug at last @pdamme
I think, in the future, we could even reduce the number of
AllocaOps
by making different kernel calls use the same memory slots for their result pointers. However, the most important goal for now is to get this bug fixed on main.