3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
436 stars 193 forks source link

Data transfer aspects (transferring data from host to device and from device to host) #1104

Open LLfbforever opened 3 months ago

LLfbforever commented 3 months ago

Version number: ver5.0

In relion ver5.0, when performing data transmission, for example, in the implementation of the getFourierTransformsAndCtfs function, the d_img.cpToDevice(); or cpToHost() function is called. The specific implementation of the cpToDevice() function is in the src/acc/acc_ptr.h file, CudaShortcuts::cpyHostToDevice(hPtr, dPtr, size, stream); How is this part of data transmission implemented? Is the data divided into chunks first, and then each stream transmits the data in the chunked portion? So how is the hPtr pointer passed?

biochem-fan commented 3 months ago

What do you want to achieve? Are you going to change RELION's GPU kernels?

A transfer initiated by d_img.cpToDevice() is performed in one go. There is no further divisions.

LLfbforever commented 3 months ago

What do you want to achieve? Are you going to change RELION's GPU kernels? no A transfer initiated by d_img.cpToDevice() is performed in one go. There is no further divisions. I saw in the source code that the specific implementation of cpToDevice() is CudaShortcuts::cpyHostToDevice(hPtr, dPtr, size, stream); The specific implementation of the cpyHostToDevice function is cudaMemcpyAsync(d_ptr, h_ptr, size * sizeof(T), cudaMemcpyHostToDevice, stream). So is the data split during this asynchronous transmission? Or is one of the streams specified for transmitting data?

biochem-fan commented 3 months ago

So is the data split during this asynchronous transmission?

No.

Or is one of the streams specified for transmitting data?

Yes.

LLfbforever commented 3 months ago

Or is one of the streams specified for transmitting data? OK, thanks. I have another question. When calling this kernel function for GPU accelerated calculations, how does this stream process different data in different streams? cuda_kernel_diff2_CC_coarse<REF3D,DATA3D,block_sz> <<<CCblocks,block_size,0,stream>>>( g_eulers, g_imgs_real, g_imgs_imag, g_trans_x, g_trans_y, g_trans_z, projector, g_corr_img, g_diff2s, translation_num, image_size, exp_local_sqrtXi2);

biochem-fan commented 3 months ago

how does this stream process different data in different streams?

I don't remember; this code was written by others (@dkimanius or @bforsbe).

LLfbforever commented 3 months ago

how does this stream process different data in different streams?

I don't remember; this code was written by others (@dkimanius or @bforsbe). ok,thank you very much.