sum_kernel.cu
The size of out_cols_data is only x_dim0 * x_dim1. It is illegal to access memory after x_dim0 * x_dim1. To prevent such illegal access, the loop in SumCsr3DGradCudaKernel is splitted into two loops.
sum_grad_kernel.cu
The length of x_crows_data is only x_dim0 * (x_dim1 + 1). Access to x_crows_data[x_dim0 * (x_dim1 + 1)] is in fact illegal. However, x_crows_data[x_dim0 * (x_dim1 + 1)] would be 0 to the alignment mechanism of StreamSafeAllocator.
Moreover, dx_values_data would never be filled when index = x_dim0 * (x_dim1 + 1) - 1. Therefore, the last iteration of the loop could be ignored.
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
sum_kernel.cu
The size ofout_cols_data
is onlyx_dim0 * x_dim1
. It is illegal to access memory after x_dim0 * x_dim1. To prevent such illegal access, the loop inSumCsr3DGradCudaKernel
is splitted into two loops.sum_grad_kernel.cu
The length ofx_crows_data
is onlyx_dim0 * (x_dim1 + 1)
. Access tox_crows_data[x_dim0 * (x_dim1 + 1)]
is in fact illegal. However,x_crows_data[x_dim0 * (x_dim1 + 1)]
would be 0 to the alignment mechanism ofStreamSafeAllocator
.Moreover,
dx_values_data
would never be filled whenindex = x_dim0 * (x_dim1 + 1) - 1
. Therefore, the last iteration of the loop could be ignored.