Closed zyz207 closed 6 years ago
If it total time of program it is ok, becuase it loads 200M model in process.
If it time of augmentation it is not, check it is really using your gpu, see example line from mine.
2017-11-17 16:42:05.460780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
I've also done some testing using 1080Ti on a 720x1280 video and it takes about 1s for model to feed forward for 4 scales and another 1s for calculating part affinity fields. By manually reduce the resolution to 368x654 the total processing time will be brought down to about 0.65s but it is still much slower than the Caffe version.
On a second look, almost half of the time in the predict loop is spent on post-processing, which is resizing, heatmap and part affinity fields. The high number of channels, 19 for heatmap and 38 for paf, as well as using the slower bicubic interpolation algorithm could be the cause for the slowdown. One way to improve this could be using CUDA version of resize function instead, but I haven't tested the difference yet.
it takes about 5s when I run demo_image, my GPU is 1080Ti