chengyangfu / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
169 stars 47 forks source link

how about the speed of the ssd16(FP16 version) running on TX1 #14

Open birdwcp opened 6 years ago

birdwcp commented 6 years ago

how about the speed of the ssd16(FP16 version) running on TX1 ?

chengyangfu commented 6 years ago

It didn't accelerate much when I tried last year. It also causes some problems(slow down) when doing the post-processing part(Non-maximum Suppression). I am reworking SSD in PyTorch now. It should be much easier to run FP16 mode. I don't have the new number in TX1 now but can provide some numbers I run on V100 GPU which contains special hardware for FP16 computation.

The following numbers mean FPS and don't contain NMS part. image

birdwcp commented 6 years ago

@chengyangfu Thank you for your reply and your detailed data. Maybe I should wait for NVIDIA's TensorRT which is said to be speedup 2X on TX1. I am running SSD on TX2 now, with some light networks it can reach a good performance (about 30fps)

foralliance commented 6 years ago

@birdwcp @chengyangfu HI

There seem to be two ways of calculating FPS.

1.via the 'time' model of caffe with batch_size = 1:

build/tools/caffe time -model=models/VGGNet/VOC0712/refinedet_vgg16_320x320/deploy.prototxt -gpu=0

2.get the time interval and estimate the FPS. for example, got the following lines after I run examples/ssd/score_ssd_pascal.py:

I0713 20:50:25.756376 117599 net.cpp:684] Ignoring source layer mbox_loss
I0713 20:51:32.472930 117599 solver.cpp:531] Test net output #0: detection_eval = 0.723217
I0713 20:51:32.473057 117599 solver.cpp:325] Optimization Done.

The total time spent on evaluating 4952 test images is about 67 seconds, and thus FPS = 4952 / 67 = 74.


Which method should be used to calculate FPS? Is there any other way?