Currently allocating a device_vector in a .cpp file will always initialize
a temporary host_vector and perform a costly host->device copy. For
primitive types using the default initializer T() it is probably
sufficient, and considerably faster, to initialize the bytes of the array
to 0 with cudaMemset().
This presumes that T() is byte-wise 0, which may not be universally true
(check the standard). If it is not true, then we can at least check at
runtime and dispatch the optimized path.
Related thread on thrust-users:
http://groups.google.com/group/thrust-users/browse_thread/thread/d292b1146895ee2
a
Original issue reported on code.google.com by wnbell on 15 May 2010 at 4:06
Original issue reported on code.google.com by
wnbell
on 15 May 2010 at 4:06