[cudamapper] Minimizer uses thrust::inclusive_scan

Minimizer::generate_sketch_elements() now uses thrust::inclusive_scan() to determine positions of reads' minimizers on device, instead of doing it on host. Total number of reads is small and the total number of minimizers still has to be copied to host so performance gains are not big, but it makes the code significantly cleaner.

This PR also removes cudaStreamSynchronize() from the end of Minimizer::generate_sketch_elements(), but note that freeing device_buffers (for now) contains calls to cudaStreamSynchronize()

NVIDIA-Genomics-Research / GenomeWorks

[cudamapper] Minimizer uses thrust::inclusive_scan #545