[HotCloud'19] Accelerating Deep Learning Inference via Freezing

ganler commented 4 years ago

ganler commented 4 years ago

I like this summary:

ganler commented 4 years ago

Computer system:

Computer Data => Spatial Locality => Cache. Video Frames => Temporal Locality => Inference Cache?

Exact cache hit is unlikely

=> Approximate Caching?

Compare the internal output with outputs in the cache. Rank the K-nearest neighbors. Prediction can be more confident if:
- More neighbors agree on the same label.
- Neighbors are closer to the input point.

Parameter: for each layer: a threshold to tell YES or NO.

During inference, a cache lookup is done after every layer and a cache hit yields a faster prediction.

ganler commented 4 years ago

The result looks good. (Too good to be true?)