Github提供的代码与书本所示代码(p129)不一致

liwd190019 commented 1 year ago

在本书的第129页，Listing 11.2 中列举了本章程序 kernel-kernel.cu 的部分内容。 Feishu20230212-213028 另外，在书本的第130页，第二段的末尾，您也说明，“为了计时方便，核函数中故意做了10^6次加法运算。” 但是，在Github相应的代码库中(CUDA-Programming/src/11-stream/kernel-kernel.cu)，我们可以发现，其中的add函数仅仅增加了10^5次，也就是说，您展示于库中的代码，与书本展示的代码并不匹配。

void __global__ add(const real *d_x, const real *d_y, real *d_z)
{
    const int n = blockDim.x * blockIdx.x + threadIdx.x;
    if (n < N1)
    {
        for (int i = 0; i < 100000; ++i)
        {
            d_z[n] = d_x[n] + d_y[n];
        }
    }
}

fever-Wong commented 1 year ago

谢谢，您发给我的邮件已经收到，我会尽快处理。Thank you,the email you sent me has been received and I will handle it as soon as possible.王景博fever wong

brucefan1983 commented 1 year ago

在本书的第129页，Listing 11.2 中列举了本章程序 kernel-kernel.cu 的部分内容。另外，在书本的第130页，第二段的末尾，您也说明，“为了计时方便，核函数中故意做了10^6次加法运算。” 但是，在Github相应的代码库中(CUDA-Programming/src/11-stream/kernel-kernel.cu)，我们可以发现，其中的add函数仅仅增加了10^5次，也就是说，您展示于库中的代码，与书本展示的代码并不匹配。
void __global__ add(const real *d_x, const real *d_y, real *d_z)
{
    const int n = blockDim.x * blockIdx.x + threadIdx.x;
    if (n < N1)
    {
        for (int i = 0; i < 100000; ++i)
        {
            d_z[n] = d_x[n] + d_y[n];
        }
    }
}

多谢指出。这里我也记不清当初是如何犯错的了。也许是写书的时候多写了一个0，也许是一开始想用1000000次加法展示，后来测试的时候为了节约时间换成了100000。这个数字只要不太小，对此处的讨论不是很重要。

brucefan1983 / CUDA-Programming

Github提供的代码与书本所示代码(p129)不一致 #22