Test sparse sum op - Githubissues

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Apache License 2.0

21.63k stars 5.44k forks source link

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

sum_kernel.cu The size of out_cols_data is only x_dim0 * x_dim1. It is illegal to access memory after x_dim0 * x_dim1. To prevent such illegal access, the loop in SumCsr3DGradCudaKernel is splitted into two loops.
sum_grad_kernel.cu The length of x_crows_data is only x_dim0 * (x_dim1 + 1). Access to x_crows_data[x_dim0 * (x_dim1 + 1)] is in fact illegal. However, x_crows_data[x_dim0 * (x_dim1 + 1)] would be 0 to the alignment mechanism of StreamSafeAllocator.

Moreover, dx_values_data would never be filled when index = x_dim0 * (x_dim1 + 1) - 1. Therefore, the last iteration of the loop could be ignored.

PaddlePaddle / Paddle

Test sparse sum op #63899

PR Category

PR Types

Description