Finally, it should be necessary to replace all calls to update methods to calls to update_on_gpu methods and perform data transfer from GPU to CPU when saving on disk is required
void Domain::run_pic()
{
.....
for ( int i = current_node; i < total_time_iterations; i++ ){
std::cout << "Time step from " << i << " to " << i+1
<< " of " << total_time_iterations << std::endl;
advance_one_time_step_on_gpu();
transfer_from_gpu_and_write_step_to_save();
}
.....
}
Single GPU should be sufficient for a start. CUDA can be used.
A general idea is to define pointers in each class that would hold data about that class on GPU, e.g.
Then define methods that would allocate memory on GPU and transfer data to and from GPU
Then duplicate all computational methods to perform on GPU. Those methods are supposed to call CUDA kernels.
Finally, it should be necessary to replace all calls to
update
methods to calls toupdate_on_gpu
methods and perform data transfer from GPU to CPU when saving on disk is required