Training loop profiler - Githubissues

Using Events as hooks into the training loop, it's possible to create a profiler for training loops that measures the time spend executing events but also the time spent inbetween the events, i.e. in the training loop.

This would allow more easily identifying possible performance bottlenecks, like:

waiting on the data iterator (identifiable as time spent between the end of StepEnd and StepBegin)
moving data to the gpu during the step (does it matter or is it just as fast as doing it in the background asynchronously?)

Thoughts on implementation | This could be implemented as a callback, though you would need two callbacks one running before all the callbacks and one after the others (to measure callback times) which is unwieldy. This solution may also not play well with the asynchronous callback scheduler proposed in #85. The imo better solution is to implement a callback execution context and does the timings before and after it runs the callbacks. It would wrap another callback execution context that it refers to, thus would also play nicely with the asynchronous callback scheduler as it would measure only the time spent on the synchronous part.

Interpretation | Events that specify start and stop points like StepBegin and StepEnd could be treated as a layer in the profiling stack. Possibly an existing package for visualizing flamegraphs could be reused to make sense of the profiling data.

FluxML / FluxTraining.jl

Training loop profiler #86