fastai / course22p2

course.fast.ai 2022 part 2
https://course.fast.ai/Lessons/part2.html
Apache License 2.0
467 stars 252 forks source link

Sped up MetricsCB and ProgressCB #18

Open PiotrCzapla opened 1 year ago

PiotrCzapla commented 1 year ago

As mentioned on the forum with minimal changes to the ProgressCB and MetricsCB we can speed up training significantly allowing for the following batch to be prepared while the first one is being processed on GPU. The speed up is noticeable when data loading is fast or model is slow enough to hide data loading latency.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

PiotrCzapla commented 1 year ago

The caching code is simple but I kept it away from this PR as it was not necessary as I could get a way with

dls.train = list(dls.train)
dls.valid = list(dls.valid)

Which is awesome as it reinforce how flexible miniai is. The code to cache dataset in memory is more complex, and it is not necessary if your model is large enough, but it is game changer on MPS, where multiprocessing works poorly. I'm not sure where to place it though.

#| export
def _with_features(ds):
    setattr((l:=fc.L(ds)), 'features', ds.features)
    return l 
class CachedDS(dict):  
    """Dict that does not print it's content, letting us inspect the dataset in Jupyter in reasonable time"""
    def __repr__(self): return "{ "+", ".join([f'{k}: (#{len(v)})' for k,v in self.items()])+" }"
    def __str__(self): return repr(self)
def cache_dataset_as_dict(dd): return CachedDS({dsn: _with_features(ds) for dsn,ds in dd.items()})