PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.63k stars 5.44k forks source link

Test sparse sum op #63899

Closed lawrence910426 closed 1 week ago

lawrence910426 commented 1 week ago

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

  1. sum_kernel.cu The size of out_cols_data is only x_dim0 * x_dim1. It is illegal to access memory after x_dim0 * x_dim1. To prevent such illegal access, the loop in SumCsr3DGradCudaKernel is splitted into two loops.

  2. sum_grad_kernel.cu The length of x_crows_data is only x_dim0 * (x_dim1 + 1). Access to x_crows_data[x_dim0 * (x_dim1 + 1)] is in fact illegal. However, x_crows_data[x_dim0 * (x_dim1 + 1)] would be 0 to the alignment mechanism of StreamSafeAllocator.

Moreover, dx_values_data would never be filled when index = x_dim0 * (x_dim1 + 1) - 1. Therefore, the last iteration of the loop could be ignored.

paddle-bot[bot] commented 1 week ago

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.