Open gdalle opened 5 months ago
Not presently, but contributions welcome!
Do you know which graph I'm talking about? Is it in some publication online?
I don't think it's in a publication, but around 10mins in @tgymnich has some in his talk at EnzymeCon https://youtu.be/nPN_Z5j6JDM?feature=shared
@tgymnich do you remember what machine you used for these measurements?
This will also now depend a lot more on the program in Julia.
for example, batching something with a linear solve will almost always be faster since we now do one linear solve to be reused for all chunks
@vchuravy this must have been a bare metal AWS machine provided by @wsmoses. I believe it was with AVX512.
I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself
8/16 should be a safe bet.
I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself
Sure, open a PR to enzyme to add a function which returns 16 for now and we can add more complex analysis later.
ForwardDiff has a heuristic for picking chunk size, with a default threshold of 12 dictated by memory bandwidth:
https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L24-L34
https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L8
Does Enzyme have something similar I could use? I seem to remember a graph showing performance as a function of chunk size, with a maximum around 8-12 as well, but it disappeared in the Slackhole