brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book
GNU General Public License v3.0
1.51k stars 316 forks source link

P108 代码有错 #19

Closed zhenkunl closed 1 year ago

zhenkunl commented 1 year ago

樊老师好,读完您的著作,对cuda编程有了很多新的认识。读书过程中发现了一处问题,P108提到”如果想要在循环内去掉对线程号的约束,又要避免出现读-写竞争,可以将相关代码改写如下:

real v = 0;
for (int offset = 16; offset > 0; offset >>= 1)
{
      v += s_y[tid + offset];
      __syncwarp();
      s_y[tid] = v;
      __syncwarp();
}

” 问题在于v的初值赋为0是有问题的,比如tid=0的线程,第一次迭代后其值变为第s_y[16],而不是s_y[0]+s_y[16]。 根据我的理解以及与同事的讨论,该处代码改成如下:

real v = s_y[tid];
for (int offset = 16; offset > 0; offset >>= 1)
{
      v += s_y[tid + offset];
      __syncwarp();
      s_y[tid] = v;
      __syncwarp();
}

才能得到正确结果。 为了便于理解,改成如下更妥:

for (int offset = 16; offset > 0; offset >>= 1)
{
      v = s_y[tid + offset];
      __syncwarp();
      s_y[tid] += v;
      __syncwarp();
}

不知我理解是否有问题,请不吝赐教。

fever-Wong commented 1 year ago

谢谢,您发给我的邮件已经收到,我会尽快处理。Thank you,the email you sent me has been received and I will handle it as soon as possible.王景博fever wong

brucefan1983 commented 1 year ago

你好,非常感谢你发现这个错误。v 的初始化确实写错了,在做+=操作之前,确实要先用共享内存的数据赋值。我会在主页面更正这个错误。这段代码不在仓库的示例代码中,故没有源代码的修改。

zhenkunl commented 1 year ago

非常期待第二版有更多精彩内容呈现

fever-Wong commented 1 year ago

谢谢,您发给我的邮件已经收到,我会尽快处理。Thank you,the email you sent me has been received and I will handle it as soon as possible.王景博fever wong