Closed tyoc213 closed 4 years ago
Check out this pull request on
Review Jupyter notebook visual diffs & provide feedback on notebooks.
Powered by ReviewNB
We're not adding new features prior to release, so I'll close this for now.
However, I'm not likely to add this in its current form, since there's a bunch of redundant callbacks there - e.g. you have self('after_backward')
followed immediately by self('before_step')
with no code between the two.
Thanks for noting the training loop order issue. Fixed now.
Yeah, thinking of that my rationale is as follows:
But checking things found that callbacks that take say 0.075s looks like this
So the time with the after event of the previous event mixes with the time of the before of the next event. But actually is true that you can put the events you want first at the end of the previous event but I think that could be not intuitive/easy to get (is more like a "trick").
Recently I seen before_loss
added IIRC so it helped fix a problem and perhaps was more clear or convenient to think in terms of that construct. And newcomers can hook more directly the event they want.
So about this PR I have noted something extrange.
if I reorder cbs=[SlowBefore2(), SlowBefore(), SlowBefore1(), TraceCB(), SlowAfter1(), SlowAfter(), BackwardCB(), StepCB()]
it shows a more "real" display, so yeah, this PR doesnt work as spected because you need to be carefull on order.
So the order change a little the shape of the output
So, I think the correct way to show is for example for loss bar by the parent of all the before_loss, then the actual loss computation then all the after_loss, but changing the order, does shuffle in a predecible way who is parent of who.
For example puting TraceCB in the middle of before and after shows this
So later I will see how to do it properly as a tracing functionality that indeed shows who is parent to who.
Later if I get it right will discuss this subject and try again with other PR.
I think it would be nice to have a mixed trace of what is possible with torch autograd profiler and the calls made from fastai2.
For this to work, other specific starting events need to be added:
before_pred
,before_loss
,before_step
.With this
TraceCB
can be added to a snippet like this:and get some like
So it is about 40 trainning seconds trainning and the rest processing and saving, you can open it with
brave://tracing/
orchrome://tracing/
.The whole result looks like
A zoom on a batch shows
where at the end of step we can see the time spend zeroing and copying
and for the validation we have
Also you can inside the with block, you can use
with torch.autograd.profiler.record_function("label-z")
or use as decorator as normal (maybe we can make a short name for it).So this mixes the available tracing in pytorch with callbacks to add more information to the tracing. I think having the whole trace of the autograd makes it more easy to see where time is going, like I always was thinking "what that time after the loss was if where callbacks or what?"... but at less I have seen now that some is on zeroing (as a side point, if we added events of start and to zero it will just "aglutinate" all those calls under a name) and copying.
Also the first idea was just to use the available
FunctionEvent
,EventList
and make it on a callback, with this you only track the steps of the training loop and dont need usewith torch.autograd.profiler.profile() as prof:
.Also I noted that
show_training_loop
prints backwardsafter_backward
andbefore_backward
, but dont know why because all others are ok (even the new ones).