[FEA] Add batching option to `filter_cells` and `filter_genes` in `rapids_scanpy_funcs`

Because the cusparse API uses 32-bit integers to specify the size of the underlying workspaces in GPU memory, and because the Scipy/Cupy sparse APIs use them to specify the size of the underlying matrices, very large datasets run into problems during the filtering of cells and genes. We can get around this constraint in two ways- we can chunk the data across different GPUs using Dask or we can batch the filters on a single GPU.

We should do this specifically for the 1M cells notebook, so that we can remove the on_device argument.

NVIDIA-Genomics-Research / rapids-single-cell-examples

[FEA] Add batching option to `filter_cells` and `filter_genes` in `rapids_scanpy_funcs` #53