baidu-research / warp-ctc

Fast parallel CTC.
Apache License 2.0
4.07k stars 1.04k forks source link

Failing GPU tests on CUDA 10.1 #146

Open ahbon123 opened 5 years ago

ahbon123 commented 5 years ago

I installed cuda10.1 for RTX 2080 GPU under Ubuntu 19.04, and I failed on GPU tests: Here are some infos for CMakeLists.txt

# need to be at least 30 or __shfl_down in reduce wont compile
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_35,code=sm_35")

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_50,code=sm_50")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_52,code=sm_52")

IF (CUDA_VERSION GREATER 7.6)
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_60,code=sm_60")
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_61,code=sm_61")
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_62,code=sm_62")
ENDIF()
ahbon123 commented 5 years ago

Not yet. x@x:~/warp-ctc/build$ ./test_gpu Running GPU tests terminate called after throwing an instance of 'std::runtime_error' what(): Error: compute_ctc_loss in small_test, stat = execution failed 已放弃

nicolaspanel commented 5 years ago

@ahbon123 I am having the same error with Ubuntu 18.04, CUDA 10.0 and RTX 2080 TI. Have you found a workaround ?

nicolaspanel commented 5 years ago

@FortuneStar good to know  ! how ?

ahbon123 commented 5 years ago

I'm trying to uninstall cuda 10.1 and reinstall cuda 10.0. I'm not sure if it will work out. Do you know which version of cuda is better on compatibility, cuda 10.0 or 9.2?

nicolaspanel commented 5 years ago

@HawkAaron any idea ?

HawkAaron commented 5 years ago

Since the Compute Capability of 2080Ti is 7.5 We need to add set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_75,code=sm_75") in CMakeList.txt

For more details, please refer to my configs.

nicolaspanel commented 5 years ago

Thanks a lot @HawkAaron

mzpmzk commented 5 years ago

1、I have the same error with Ubuntu 16.04, CUDA 10.0 and RTX 2080 TI( Error: compute_ctc_loss in small_test, stat = execution failed)!

2、I follow @HawkAaron 's instruction(just add the following code to CMakeLists.txt), but it does not work for me(export the same error),could you give me some suggestions? @HawkAaron

IF (CUDA_VERSION GREATER 8.9)
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_70,code=sm_70")
ENDIF()
IF (CUDA_VERSION GREATER 9.9)
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_75,code=sm_75")
ENDIF()

3、Did you solve this problem yet? can you share your experience for solving this problem?@ahbon123 @nicolaspanel

ahbon123 commented 5 years ago

I reinstall Cuda 10.0, it's OK right now.

mzpmzk commented 5 years ago

1、I have the same error with Ubuntu 16.04, CUDA 10.0 and RTX 2080 TI( Error: compute_ctc_loss in small_test, stat = execution failed)!

2、I follow @HawkAaron 's instruction(just add the following code to CMakeLists.txt), but it does not work for me(export the same error),could you give me some suggestions? @HawkAaron

IF (CUDA_VERSION GREATER 8.9)
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_70,code=sm_70")
ENDIF()
IF (CUDA_VERSION GREATER 9.9)
    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_75,code=sm_75")
ENDIF()

3、Did you solve this problem yet? can you share your experience for solving this problem?@ahbon123 @nicolaspanel

Maybe my docker env is too old(ubuntu 14.04), I just upgrade the system to ubuntu16.04 and follow @HawkAaron 's instruction, it works now.