Closed engiecat closed 6 years ago
I tested with Tesla P40 @ Ubuntu 16.04 machine and discovered that it runs quite okay. Time measurement Without Coop: real 0m8.093s user 0m5.840s sys 0m2.316s
With Coop: real 0m8.089s user 0m5.888s sys 0m2.260s
The problem is, it still doesn't run in my windows machine, with GPUassert: unspecified launch failure with cudaDeviceSynchronize https://github.com/NVIDIA/nv-wavenet/blob/0822dc523b0873f4d9cabd24364787dcb01377a2/pytorch/wavenet_infer.cu#L98
Maybe due to WDDM TDR feature, but it didn't recover for 20+ minutes so there seems to be a problem.
Changing cudaLaunchCooperative to cudaLaunch will not affect performance, but it will prevent the CUDA driver from being able to guarantee simultaneous execution of the synchronizing threads. This could lead to deadlock.
The single-block variant does not require cooperative groups.
@BrianPharris That's probably why it hangs with windows. Thank you for the information!
It works with single block variant! Thank you!
@BrianPharris I don't see any explicit usage of thread_group/block synchronization with cooperative group in the persistent kernel. Would using that make synchronization cost lower (instead of using spin lock with global memory)?
Firstly, Thank you for developing this repos :)
I've been trying to develop a Windows port of this repos, and I managed to build the pytorch version of this using MSVC, and I had met with CUDA error 71 (cudaErrorNotSupported). (I'm currently using GTX1060 6GB on Windows 10 with CUDA 9.0)
The error was tracked to https://github.com/NVIDIA/nv-wavenet/blob/0822dc523b0873f4d9cabd24364787dcb01377a2/nv_wavenet_persistent.cuh#L529
Via https://devtalk.nvidia.com/default/topic/1022751/cuda-setup-and-installation/gtx-1080-does-not-support-cooperative-kernel-launch-/, I discovered that co-op kernel is only available with linux or Windows in TCC mode.
Would it be possible to use in non-coop kernel (change cudaLaunchCooperativeKernel to cudaLaunchKernel), and if possible, how much performance loss would there be?