Open maleadt opened 2 months ago
Not fully up to speed here, but my hope is #582 will help with these situations. But currently it does still create the issue that Enzyme want's to run some optimization passes,
For C++ we solve this by using Enzyme as a plugin, via a magic function, instead of using it as a frontend driver, so maybe we could use a "EnzymePass", but the challenge would be to register that.
So the Idea would be to write:
ptr = var"gpuc.deferred"(f, primal_args...)
__enzyme_autodiff(ptr, ad_args...)
Instead of #599, but then schedule a custom pass that recognizes __enzyme_autodiff
.
the challenge would be to register that
What about a ScopedValue allowing a third-party app to customize the GPUCompiler pipeline? We could do something like LLVM, with different extension points that allow Enzyme to insert passes where needed.
That would be wicked, but currently
function mykernel_grad(x, dx)
autodiff_deferred(mykernel, Duplicated(x, dx))
end
@cuda mykernel_grad
And so we never enter on the CPU an Enzyme scope.
https://github.com/JuliaGPU/GPUCompiler.jl/pull/619