[UCI CS Seminar] DNN Training Acceleration through Better Communication-Computation Overlap

Communication-Computation Overlap

There is a very interesting term called no sync window, which means the period that the activation must be cached from being produced to being consumed (updated).

Distributed patterns: data | model | hybrid parallel

Problem: Compute underutilization

So what we can do to increase the pipeline overlap is to analyze the dependency of data flows & make some priorities / orders in sending the parameters and activation;

Sangeetha classified current research on DNN training acceleration into 3 parts:

Better communication-computation overlap;
Increasing computation time;
Decreasing communication time;

Her TicTac [MLSys'19] is for PS; Caramel is for Ring Allreduce;

ganler / ResearchReading

[UCI CS Seminar] DNN Training Acceleration through Better Communication-Computation Overlap #36

Communication-Computation Overlap