question about performance

NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

BSD 3-Clause "New" or "Revised" License

735 stars 126 forks source link

I have built for pascal/volta with sm_60/sm_70, and it runs well on P100 & V100. but when I set all parameters the same but with different precisions fp16/fp32, but got the similar performance when the mode is PERSISTENT, both on P100 and V100, but it is totally different when set to other mode(SINGLE, DUAL). so, first question is, why does this happened?

second question is, the V100 has enable Tensor Core, Did this souce code use tensor core?

and last question is, the V100 is up to date, so it is supposed to be better, however, the actually performance is close even a little less than P100, so why?

		GP100	V100-PCIE
Graphics clock		1556 MHz	1380 MHz
Memory clock		715 MHz	877 MHz
Medium-Single	FP16	28.04	26.97
Medium-Single	FP32	10.69	11.26
Medium-Dual	FP16	32.40	31.02
Medium-Dual	FP32	14.68	15.28
Medium-Persistent	FP16	41.85	42.83
Medium-Persistent	FP32	37.50	42.02

Regarding fp16/fp32 performance, the single- and dual-block implementations are limited by the time to stream weights into the Streaming Multiprocessor -- fp16 weights are half the size of fp32 weights. The persistent variant, on the other hand, loads all weights into the Streaming Multiprocessor registers so the weight bandwidth is not in the performance-critical path. This is discussed more fully on the blog post at https://devblogs.nvidia.com/nv-wavenet-gpu-speech-synthesis/.

Regarding P100 vs V100: Since this is just batch=1, performance is mostly a function of core clock. Since your P100 clock is higher than V100, it is expected that it will perform better on a model which fits into both GPUs. Since V100 is larger, it can support larger models in the persistent mode. V100 will also provide higher throughput (batch size) than P100 in the single- and dual-cta modes, as it can run more blocks in parallel.

NVIDIA / nv-wavenet

question about performance #18