[Compile]Error compiling with TENSORRT on CUDA 12.2

engineer1109 commented 1 year ago

bug描述 Describe the Bug

目前版本分支 develop 12a296cbcfdfb226af4374279726a5d854dfec37 cmake .. -DWITH_CUSTOM_DEVICE=ON -DWITH_GPU=ON -DWITH_TENSORRT=ON

CUDA 12.2 apt 安装 TENSOR 8.6.1.6 apt 安装 libnvinfer-dev libnvinfer-plugin-dev

出现大量的相同编译错误

/media/wjl/D2/github/fork/Paddle/paddle/fluid/inference/tensorrt/plugin/many_emb_layernorm_kernel.cu(83): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200101_750_NS::KeyValuePair<float, float>, cub::CUB_200101_750_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200101_750_NS::Sum
      threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                   ^
          detected during:
            instantiation of "void paddle::inference::tensorrt::plugin::embLayerNormKernel_2<T,TPB>(int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 279
            instantiation of "int32_t paddle::inference::tensorrt::plugin::embSkipLayerNorm_2(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, int32_t, const float *, const float *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 365

cub::Sum 有问题

其他补充信息 Additional Supplementary Information

No response

ForFishes commented 1 year ago

您好，请问用的是官方镜像吗？使用官方发布的镜像编译试一试呢？https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/linux-docker.html

engineer1109 commented 1 year ago

@ForFishes 没用镜像，问题来源是CUDA12.1 没问题， CUDA 12.2有问题

jeng1220 commented 1 year ago

nvbugs 4202615

jeng1220 commented 1 year ago

~~cub::Sum是__host__ __device__ __forceinline__ T cub::Sum::operator()~~ ~~最快的workaround是將其替換成~~ ~~// threadData = pairSum(threadData, kvp(rldval, rldval * val));~~ ~~threadData.key += rldva;~~ ~~threadData.value += rldval * val;~~

目前看起來，問題是新的cub::Sum複用::cuda::std::plus<>所引起的: https://github.com/NVIDIA/cub/blob/main/cub/thread/thread_operators.cuh#L79

舊cub是自行實現Sum: https://github.com/NVIDIA/cub/blob/2.0.X/cub/thread/thread_operators.cuh#L97C1-L106C3

因cub是底層算子，warp_reduce、block_reduce 和 device_reduce 也會受影響故不是光改一個地方就能解決問題

jeng1220 commented 1 year ago

@engineer1109 , 問題應已修復，若沒問題的話，麻煩關閉Issue。

jeng1220 commented 1 year ago

@engineer1109 , 由於問題已修復，故關閉這Issue，若你仍遇到問題，請再此開啟Issue

PaddlePaddle / Paddle

[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information