ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

[HotCloud'19] Accelerating Deep Learning Inference via Freezing #8

Closed ganler closed 4 years ago

ganler commented 4 years ago

https://www.usenix.org/conference/hotcloud19/presentation/kumar

ganler commented 4 years ago

I like this summary:

image

ganler commented 4 years ago

Computer system:

Computer Data => Spatial Locality => Cache. Video Frames => Temporal Locality => Inference Cache?

Exact cache hit is unlikely

=> Approximate Caching?

  1. Compare the internal output with outputs in the cache. Rank the K-nearest neighbors. Prediction can be more confident if:
    • More neighbors agree on the same label.
    • Neighbors are closer to the input point.

image

Parameter: for each layer: a threshold to tell YES or NO.

image

During inference, a cache lookup is done after every layer and a cache hit yields a faster prediction.

ganler commented 4 years ago

image

The result looks good. (Too good to be true?)