EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
459 stars 66 forks source link

Heuristic for picking the chunk/batch size? #1542

Open gdalle opened 5 months ago

gdalle commented 5 months ago

ForwardDiff has a heuristic for picking chunk size, with a default threshold of 12 dictated by memory bandwidth:

https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L24-L34

https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L8

Does Enzyme have something similar I could use? I seem to remember a graph showing performance as a function of chunk size, with a maximum around 8-12 as well, but it disappeared in the Slackhole

wsmoses commented 5 months ago

Not presently, but contributions welcome!

gdalle commented 5 months ago

Do you know which graph I'm talking about? Is it in some publication online?

vchuravy commented 5 months ago

I don't think it's in a publication, but around 10mins in @tgymnich has some in his talk at EnzymeCon https://youtu.be/nPN_Z5j6JDM?feature=shared

tgymnich commented 5 months ago
image
vchuravy commented 5 months ago

@tgymnich do you remember what machine you used for these measurements?

wsmoses commented 5 months ago

This will also now depend a lot more on the program in Julia.

for example, batching something with a linear solve will almost always be faster since we now do one linear solve to be reused for all chunks

tgymnich commented 5 months ago

@vchuravy this must have been a bare metal AWS machine provided by @wsmoses. I believe it was with AVX512.

gdalle commented 5 months ago

I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself

vchuravy commented 5 months ago

8/16 should be a safe bet.

wsmoses commented 5 months ago

I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself

Sure, open a PR to enzyme to add a function which returns 16 for now and we can add more complex analysis later.