Performance Issues using FP16 on TitanV

NVIDIA / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

672 stars 263 forks source link

Performance Issues using FP16 on TitanV #517

Closed Ultron-11 closed 6 years ago

Ultron-11 commented 6 years ago

I tested the performance using FP16 type, it seemed that FP16 did not faster than FP32 but slower. Environment:

Ubuntu 16.04, CUDA 9.0, cudnn7.0
Titan V
NVIDIA/caffe version: 0.17
model:example/cifar10/train_full.sh

Please have a look at the logs I have attached. fp16.log fp32.log

It can be noticed that the number of iterations per second of FP16 is less than FP32

drnikolaev commented 6 years ago

This might help: https://devtalk.nvidia.com/default/topic/1027346/titan-v-fp16-performance/

Ultron-11 commented 6 years ago

I tested the speed of the fp16 on the resnet50. When the 8 card is only twice as fast as the fp32， is this normal? The only change is that I changed the DataLayer to an InputLayer.

1duo commented 6 years ago

When the 8 card is only twice as fast as the fp32， is this normal?

@Ultron-11 I got similar numbers. I guess it's normal. It matches many other people's benchmark you can find online such as this one: http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/09/27/deep-learning-on-v100.

drnikolaev commented 6 years ago

@Ultron-11 @1duo - it actually depends on how GPUs talk to each other. PCIe, NVLink, NVSwitch?

1duo commented 6 years ago

@drnikolaev Thanks for your reply, my configurations are very similar to http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/09/27/deep-learning-on-v100. 8 V100 GPUs connected via PCIe. And I got similar numbers as mentioned in above article, my numbers are ~2300 img/sec for FP16, ~1200 img/sec for FP32 ResNet50 on ImageNet. Are these numbers expected? Can we improve it further?

drnikolaev commented 6 years ago

@1duo could you upload your prototxt file(s) here please? Also, here is a sample run on 8xV100 on NVLink (see the last line): https://github.com/NVIDIA/caffe/blob/models/RN50-FP16-20180201/resnet50-0.16.6-idl-fp16-88ep_10526.log

drnikolaev commented 6 years ago

@1duo @Ultron-11 could you verify https://github.com/drnikolaev/caffe/tree/caffe-0.17 release candidate?

1duo commented 6 years ago

@drnikolaev I no longer have access to V100 machines. Can't help here, sorry for the inconvenience. Thanks.