NVIDIA-Genomics-Research / GenomeWorks

SDK for GPU accelerated genome assembly and analysis
https://clara-parabricks.github.io/GenomeWorks/
Apache License 2.0
286 stars 76 forks source link

[cudamapper] Use pinned memory and single array in IndexHostCopy #481

Closed mimaric closed 4 years ago

mimaric commented 4 years ago

When copying data to or from IndexHostCopy pin host memory. This is currently actually slightly slower than using pageable memory due to high costs of calling cudaHostRegister() and cudaHostUnregister(), but it will bring big performance improvements once once we start overlapping those copies with overlap computations. Using pageable memory would prevent overlapping of communication and computation. In the future we probably also won't be using cudaHostRegister() and cudaHostUnregister() which means that IndexHostMemoryPinner is likely to change significantly.

Merged all arrays in IndexHostCopy into one array. As we'll likely use pool allocator this way we can reduce fragmentation.

Also made a few smaller changes to the way streams and allocators are handled in IndexGPU

Part of #318

mimaric commented 4 years ago

Included in #523