Closed carterbox closed 10 months ago
Overlap computation and memory transfer better.
Use contiguous pinned memory and CUDA events to utilize streams more efficiently by writing a new stream_and_modify function.
Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Purpose
Overlap computation and memory transfer better.
Approach
Use contiguous pinned memory and CUDA events to utilize streams more efficiently by writing a new stream_and_modify function.
Pre-Merge Checklists
Submitter
Reviewer