Port the RPIE and LSTSQ implementations to the new CUDA Stream managing API. The one that requires contiguous GPU memory layout for batches and uses CUDA Events to coordinate memory transfers.
Approach
The variable probe update functions are made more explicit, and batch indexes are replaced with lo:hi ranges.
Pre-Merge Checklists
Submitter
[ ] Write a helpfully descriptive pull request title.
[ ] Organize changes into logically grouped commits with descriptive commit messages.
[ ] Document all new functions.
[ ] Click 'details' on the readthedocs check to view the updated docs.
[ ] Write tests for new functions or explain why they are not needed.
[ ] Address any complaints from pep8speaks.
Reviewer
[ ] Actually read all of the code.
[ ] Run the new code yourself; the included tests should make this easy.
[ ] Write a summary of the changes as you understand them.
Purpose
Port the RPIE and LSTSQ implementations to the new CUDA Stream managing API. The one that requires contiguous GPU memory layout for batches and uses CUDA Events to coordinate memory transfers.
Approach
The variable probe update functions are made more explicit, and batch indexes are replaced with lo:hi ranges.
Pre-Merge Checklists
Submitter
Reviewer