fastai / fastai2

Temporary home for fastai v2 while it's being developed
https://dev.fast.ai
Apache License 2.0
645 stars 234 forks source link

Add to test_utils TraceCB #532

Closed tyoc213 closed 4 years ago

tyoc213 commented 4 years ago

I think it would be nice to have a mixed trace of what is possible with torch autograd profiler and the calls made from fastai2.

For this to work, other specific starting events need to be added: before_pred, before_loss, before_step.

With this TraceCB can be added to a snippet like this:

%%time
from fastai2.vision.all import *
from fastai2.test_utils import VerboseCallback, TraceCB

from fastai2.vision.all import *
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
def label_func(f): return f[0].isupper()
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
with torch.autograd.profiler.profile() as prof:
    learn.fine_tune(1, cbs=TraceCB())
    print('done')
print("saving...")
prof.export_chrome_trace("cuda_prof_trace.json")
print("saved")

and get some like

CPU times: user 50.5 s, sys: 3.74 s, total: 54.2 s
Wall time: 1min 18s

So it is about 40 trainning seconds trainning and the rest processing and saving, you can open it with brave://tracing/ or chrome://tracing/.

The whole result looks like

image

A zoom on a batch shows

image

where at the end of step we can see the time spend zeroing and copying

image

and for the validation we have

image

Also you can inside the with block, you can use with torch.autograd.profiler.record_function("label-z") or use as decorator as normal (maybe we can make a short name for it).

So this mixes the available tracing in pytorch with callbacks to add more information to the tracing. I think having the whole trace of the autograd makes it more easy to see where time is going, like I always was thinking "what that time after the loss was if where callbacks or what?"... but at less I have seen now that some is on zeroing (as a side point, if we added events of start and to zero it will just "aglutinate" all those calls under a name) and copying.


Also the first idea was just to use the available FunctionEvent, EventList and make it on a callback, with this you only track the steps of the training loop and dont need use with torch.autograd.profiler.profile() as prof:.

import threading
import time
from torch.autograd.profiler import FunctionEvent, EventList

class ChromeStatics(Callback):
    def __init__(self, ll):
        super().__init__()
        self.l = ll
        self.d = {}
    def __call__(self, event_name):
        tid = threading.get_ident()
        tname = threading.current_thread().name
        if event_name.startswith('before_'):
            evt = event_name.replace('before_', '')
            self.d[evt] = time.clock_gettime_ns(time.CLOCK_PROCESS_CPUTIME_ID)/1000
        elif event_name.startswith('after_'):
            evt = event_name.replace('after_', '')
            end = time.clock_gettime_ns(time.CLOCK_PROCESS_CPUTIME_ID)/1000
            if evt in self.d:
                start = self.d[evt]
                fe = FunctionEvent(tname, tid, evt, tid, start, end)
            else:
                print(f'{evt} no en {self.d}')
                start = end-500 # 1-time after_xxx
                fe = FunctionEvent(tname, tid, evt, tid, start, end)
            self.l.append(fe)
            self.d.pop(evt, None)
    def export_chrome_trace(self, fname="export_chrome_trace.json", dumps=False):
        with open(fname, mode="w+") as f:
            prof = EventList(self.l)
            prof.export_chrome_trace(f.name)
            if dumps:
                parsed = json.load(f)
                print(json.dumps(parsed, indent=4, sort_keys=True))

Also I noted that show_training_loop prints backwards after_backward and before_backward, but dont know why because all others are ok (even the new ones).

image

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

jph00 commented 4 years ago

We're not adding new features prior to release, so I'll close this for now.

However, I'm not likely to add this in its current form, since there's a bunch of redundant callbacks there - e.g. you have self('after_backward') followed immediately by self('before_step') with no code between the two.

jph00 commented 4 years ago

Thanks for noting the training loop order issue. Fixed now.

tyoc213 commented 4 years ago

Yeah, thinking of that my rationale is as follows:

But checking things found that callbacks that take say 0.075s looks like this

image

So the time with the after event of the previous event mixes with the time of the before of the next event. But actually is true that you can put the events you want first at the end of the previous event but I think that could be not intuitive/easy to get (is more like a "trick").

Recently I seen before_loss added IIRC so it helped fix a problem and perhaps was more clear or convenient to think in terms of that construct. And newcomers can hook more directly the event they want.

So about this PR I have noted something extrange.

image image image

if I reorder cbs=[SlowBefore2(), SlowBefore(), SlowBefore1(), TraceCB(), SlowAfter1(), SlowAfter(), BackwardCB(), StepCB()] it shows a more "real" display, so yeah, this PR doesnt work as spected because you need to be carefull on order.

So the order change a little the shape of the output

image

So, I think the correct way to show is for example for loss bar by the parent of all the before_loss, then the actual loss computation then all the after_loss, but changing the order, does shuffle in a predecible way who is parent of who.

For example puting TraceCB in the middle of before and after shows this

image

So later I will see how to do it properly as a tracing functionality that indeed shows who is parent to who.

Later if I get it right will discuss this subject and try again with other PR.