REF: Load batches into contiguous memory blocks

Purpose

Load batched variables (scan positions, diffraction patterns, eigen probe weights) into contiguous blocks of memory according to their minibatch instead of reading them from the memory in the order that they were collected. This will enable more efficient data movement on and to/from the GPU because only contiguous blocks of pinned memory can be transferred to the GPU asynchronously.

Approach

Create a new splitting method to determine which positions are assigned to which GPU that just sorts the positions along one axis. Then immediately apply the user requested batch method to the divided positions instead of loading them to the GPU first. Once we have figured out all the batches and converted them into indices in the originally-ordered dataset, we can load the batches into contiguous blocks of GPU memory. The batch indices, then become ranges of consecutive integers.

I have also refactored the clustering methods so that they behave uniformly. i.e. When the number clusters is larger than the population being divided, some empty batches are now returned instead of an error being raised which complains that there are too many batches. Also, batches are expected to be ordered from largest batch to smallest batch when the number of batches does not divide the population evenly.

Pre-Merge Checklists

Submitter

[ ] Write a helpfully descriptive pull request title.
[ ] Organize changes into logically grouped commits with descriptive commit messages.
[ ] Document all new functions.
[ ] Click 'details' on the readthedocs check to view the updated docs.
[ ] Write tests for new functions or explain why they are not needed.
[ ] Address any complaints from pep8speaks.

Reviewer

[ ] Actually read all of the code.
[ ] Run the new code yourself; the included tests should make this easy.
[ ] Write a summary of the changes as you understand them.
[ ] Thank the submitter.

AdvancedPhotonSource / tike