NVIDIA-Genomics-Research / rapids-single-cell-examples

Examples of single-cell genomic analysis accelerated with RAPIDS
Apache License 2.0
323 stars 68 forks source link

Batched preprocessing for 1M cell notebook... #75

Closed rilango closed 3 years ago

rilango commented 3 years ago

To avoid failure to process more then 1 Million cells, most of the processing before filtering HVG is now batched. Dask is used for this purpose. Additional functions are added to utils to support this.

Apart from this a new verb is added to launch script to make development easier.

avantikalal commented 3 years ago

In cell 13, I don't see why conversion to anndata is necessary, since the HVG calculation is now being done outside anndata.

rilango commented 3 years ago

In cell 13, I don't see why conversion to anndata is necessary, since the HVG calculation is now being done outside anndata.

Yes. I did not analyze cell 14 closely. We can remove cell 13 and migrate cell 14 to use cupy instead. I will send a patch soon.

rilango commented 3 years ago

In cell 13, I don't see why conversion to anndata is necessary, since the HVG calculation is now being done outside anndata.

Resolved.

avantikalal commented 3 years ago

Looks good, I think we need some explanation of the changes inside the notebook. Could you add a few lines above cell 2 explaining the use of dask here?

rilango commented 3 years ago

Looks good, I think we need some explanation of the changes inside the notebook. Could you add a few lines above cell 2 explaining the use of dask here?

Done. Please check the documentation under 'Load and Prepare Data'