Closed ahehn-nv closed 4 years ago
I agree about a)
You're also right about b)
, as both H2D copy and SketchElementImpl::generate_sketch_elements
happen on the same stream there is no need to sync in between.
In fact I think I was too aggressive with deallocating host memory in indexer, no host memory is allocated after this point, so that memory can be deallocated at the end of the function, but a sync would be needed at the end of the function (i.e. before deallocating merged_basepairs_d
) in that case.
Anyway, there are a lot inefficiencies in IndexGPU
and SketchElementImpl
. For now Ill only address your comments
a)and
b)` and leave further optimizations for a later point.
While reviewing a PR I noticed a)
cudaStreamSynchronize()
is missing aGW_CU_CHECK_ERR
in https://github.com/clara-parabricks/GenomeWorks/blob/d715ab18b9a704726350613b6bb248a741b0d9f3/cudamapper/src/index_gpu.cuh#L781b) I think the block around the mentioned
cudaStreamSynchronize()
:could be changed to
which could potentially allow for a bit more overlapping. @mimaric ?