In using CUDA one important time consuming task is the transfer of variables
from host to device and from device to host. It is necessary to investigate the
time spent for this task wrt to the total solution time. This should be done
for a series of meshes to understand the dependency of transfer time on problem
size. NVIDIA's profiling tools might be useful for this purpose.
It is expected to find out that the transfer time is a critical bottleneck. If
this is verified solution algorithm should be modified accordingly.
Original issue reported on code.google.com by cuneytsert on 13 Jun 2012 at 10:32
Original issue reported on code.google.com by
cuneytsert
on 13 Jun 2012 at 10:32