Open LLfbforever opened 3 months ago
What do you want to achieve? Are you going to change RELION's GPU kernels?
A transfer initiated by d_img.cpToDevice()
is performed in one go. There is no further divisions.
What do you want to achieve? Are you going to change RELION's GPU kernels? no A transfer initiated by
d_img.cpToDevice()
is performed in one go. There is no further divisions. I saw in the source code that the specific implementation of cpToDevice() is CudaShortcuts::cpyHostToDevice(hPtr, dPtr, size, stream); The specific implementation of the cpyHostToDevice function is cudaMemcpyAsync(d_ptr, h_ptr, size * sizeof(T), cudaMemcpyHostToDevice, stream). So is the data split during this asynchronous transmission? Or is one of the streams specified for transmitting data?
So is the data split during this asynchronous transmission?
No.
Or is one of the streams specified for transmitting data?
Yes.
Or is one of the streams specified for transmitting data? OK, thanks. I have another question. When calling this kernel function for GPU accelerated calculations, how does this stream process different data in different streams? cuda_kernel_diff2_CC_coarse<REF3D,DATA3D,block_sz> <<<CCblocks,block_size,0,stream>>>( g_eulers, g_imgs_real, g_imgs_imag, g_trans_x, g_trans_y, g_trans_z, projector, g_corr_img, g_diff2s, translation_num, image_size, exp_local_sqrtXi2);
how does this stream process different data in different streams?
I don't remember; this code was written by others (@dkimanius or @bforsbe).
how does this stream process different data in different streams?
I don't remember; this code was written by others (@dkimanius or @bforsbe). ok,thank you very much.
Version number: ver5.0
In relion ver5.0, when performing data transmission, for example, in the implementation of the getFourierTransformsAndCtfs function, the d_img.cpToDevice(); or cpToHost() function is called. The specific implementation of the cpToDevice() function is in the src/acc/acc_ptr.h file, CudaShortcuts::cpyHostToDevice(hPtr, dPtr, size, stream);
How is this part of data transmission implemented? Is the data divided into chunks first, and then each stream transmits the data in the chunked portion? So how is the hPtr pointer passed?