BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.12k stars 18.7k forks source link

Report standard deviation of the times in "caffe time" #6058

Open andrew-wja opened 6 years ago

andrew-wja commented 6 years ago

Issue summary

caffe time currently prints only the mean across N iterations for all printed statistics. In order to put error bars on graphs using numbers obtained via caffe time there is currently only one option: to execute caffe time multiple times, with -iterations=1. This yields noisy and inaccurate numbers, and gives a pessimistic view of the runtime of inference in Caffe.

It should be a relatively straightforward change to also report the standard deviation for all timings -- the data must already exist in order to compute the mean. This would allow researchers to report the variation in inference time correctly, without having to use -iterations=1.

Steps to reproduce

Execute caffe time with any trained model and observe printed timings.

Your system configuration

N/A

Noiredd commented 6 years ago

To compute the deviation we'd have to store individual measurements, while for the mean it's enough to keep cumulative sums - which is exactly what Caffe does. But I suppose it wouldn't hurt to implement a more detailed measurement output... do you think you could do that and submit a pull request?

Anyway, this (and any more advanced statistics) can already be done manually in Python with time.clock() (just remember that the first run is likely to be slower due to allocations etc.).

IlyaOvodov commented 6 years ago

I want just to remark that there is no need to store individual measurements, but just cumulative sum of squared value. Dev. = sqrt( <x^2> - <x>^2 )

Noiredd commented 6 years ago

@IlyaOvodov But to compute the squared value you need to know the mean time, which you don't know until you got all the measurements.

IlyaOvodov commented 6 years ago

@Noiredd, no. I mean well known way to calculate dispersion in 1 pass. If you have got sequence {x_i} of random values, obvious way to calculate dispersion implies 2 passes: 1) = sum(x_i)/N, 2) D = sum((x_i-)^2))/N. But the same can be done in 1 pass (or on the fly if x_i appears one by one like in this case): D = sum<x_i^2>/N - (sum/N)^2. So it does not require to array store all times.

Noiredd commented 6 years ago

@IlyaOvodov Yeah, you're right. It would actually be pretty easy to implement.