When copying data to or from IndexHostCopy pin host memory. This is currently actually slightly slower than using pageable memory due to high costs of calling cudaHostRegister() and cudaHostUnregister(), but it will bring big performance improvements once once we start overlapping those copies with overlap computations. Using pageable memory would prevent overlapping of communication and computation.
In the future we probably also won't be using cudaHostRegister() and cudaHostUnregister() which means that IndexHostMemoryPinner is likely to change significantly.
Merged all arrays in IndexHostCopy into one array. As we'll likely use pool allocator this way we can reduce fragmentation.
Also made a few smaller changes to the way streams and allocators are handled in IndexGPU
When copying data to or from
IndexHostCopy
pin host memory. This is currently actually slightly slower than using pageable memory due to high costs of callingcudaHostRegister()
andcudaHostUnregister()
, but it will bring big performance improvements once once we start overlapping those copies with overlap computations. Using pageable memory would prevent overlapping of communication and computation. In the future we probably also won't be usingcudaHostRegister()
andcudaHostUnregister()
which means thatIndexHostMemoryPinner
is likely to change significantly.Merged all arrays in
IndexHostCopy
into one array. As we'll likely use pool allocator this way we can reduce fragmentation.Also made a few smaller changes to the way streams and allocators are handled in
IndexGPU
Part of #318