Closed cjnolet closed 3 years ago
@cjnolet
In the last update to 21.06, I found that StandardScaler() works well for 70K cells but is for some reason very slow on 1.3M cells. Therefore I used cupy directly in the 1.3 M cell notebook in cell 10 here: https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_gpu_analysis_uvm.ipynb. If this is still the case (and it seems to be) we should continue to use cupy instead of StandardScaler() in the 1.3M cell notebook.
Is there a reason to continue using utils.pca in the 1.3M cell notebook if the full PCA can be run?
Please update the Dockerfile too.
@avantikalal,
@avantikalal, while Updating the docker file I noticed that atacworks has some hard dependencies (for example, on scikit-learn version 0.21.3 here) which are causing some messages like this during the docker build:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dask-cudf 21.8.2 requires cupy-cuda110, which is not installed.
cudf 21.8.2 requires cupy-cuda112, which is not installed.
umap-learn 0.5.1 requires scikit-learn>=0.22, but you have scikit-learn 0.21.3 which is incompatible.
dask-ml 1.9.0 requires scikit-learn>=0.23, but you have scikit-learn 0.21.3 which is incompatible.
dask-cuda 21.8.0 requires numba>=0.53.1, but you have numba 0.52.0 which is incompatible.
cudf 21.8.2 requires numba>=0.53.1, but you have numba 0.52.0 which is incompatible.
atacworks 0.3.4 requires numpy~=1.19.4, but you have numpy 1.21.2 which is incompatible.
atacworks 0.3.4 requires setuptools~=51.1.1, but you have setuptools 57.4.0 which is incompatible.
Are all of those explicit versions necessary for Atacworks to function properly?
@cjnolet
I doubt they are absolutely necessary. You can check by running the Example 5 notebook - if it runs normally in the docker container, there is no problem.
@avantikalal, I can't seem to find a stable configuration in the Dockerfile for RAPIDS 21.08 that works well given the hard requirements in the atacworks package. I've built the docker container using the RAPIDS 21.08-cuda11.0 container (no other changes to the Dockerfile) and the notebook won't even get past the imports:
ImportError: numpy.core.multiarray failed to import
I tried changing some of the other dependencies but get strange numba errors from cudf. I removed the versions from the requirements.txt
in the AtacWorks repository and the notebooks executed successfully. I have the Dockerfile
cloning the AtacWorks repository and doing a pip install in place. Should I submit a PR to AtacWorks or should we just depend on my fork until the next release of the AtacWorks package?
The current changes in this branch work, btw, so we could also merge these for now and update the AtacWorks git repository in the Dockerfile once the changes are merged.
We have dropped official support for CUDA 10.x versions in RAPIDS so I've dropped those conda environment files. I will also add new files for CUDA 11.1 and 11.2.
Notable changes since last supported version:
StandardScaler
, which will perform the mean centering and normalize to unit varianceI'm also finishing up a blog on HDBSCAN which showcases our lung notebook. I can also add HDBSCAN to that notebook in a follow-on PR.