ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

[MLSys'19] Beyond Data and Model Parallelism for Deep Neural Networks #24

Closed ganler closed 4 years ago

ganler commented 4 years ago

https://mlsys.org/Conferences/2019/doc/2019/16.pdf

Yet another hardcore work from Zhihao.

ganler commented 4 years ago

Challenges

The SOAP search space

image

Cost

Measuring distributed execution on real hardware is slow.

2 Obs.

Execution simulator:

Operator Graph to Task Graph.

Delta Simulation Algo.

Incremental update. (Less re-profiling)

Reduce search time by 2-7x => take only a few minutes.

image

What we need in a paper