KrishnaswamyLab / phateR

PHATE dimensionality reduction method implemented in R
GNU General Public License v2.0
77 stars 9 forks source link

phateR crash issues #69

Closed fparyani closed 1 year ago

fparyani commented 1 year ago

Describe the bug I am running the phate function with donor correction on a counts matrix that contains 34,017 cells and 21,820 genes. The whole pipeline works fine and the phate function begins to run but then suddenly crashes.

Actual behavior The warning and error from phate.

/home/fp2409/.local/lib/python3.8/site-packages/graphtools/graphs.py:108: UserWarning: Cannot set knn (5) to be greater than n_samples - 2 (-1). Setting knn=-1
  warnings.warn(
      Calculating KNN search...
/home/fp2409/.local/lib/python3.8/site-packages/graphtools/graphs.py:236: UserWarning: Metric euclidean not valid for `sklearn.neighbors.BallTree`. Graph instantiation may be slower than normal.
  warnings.warn(
    Calculated subgraphs in 2.01 seconds.
  Calculated graph and diffusion operator in 73.14 seconds.
Calculated PHATE in 73.15 seconds.
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  sklearn.utils._param_validation.InvalidParameterError: The 'n_neighbors' parameter of NearestNeighbors must be an int in the range [1, inf) or None. Got 0 instead.
In addition: Warning message:
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 5.5 GiB

System information:

Output of phate.__version__:

Please run phate.__version__ and paste the results here.

You can do this with `python -c 'import phate; print(phate.__version__)'`

Input: python3 -c 'import phate; print(phate.__version__)'
Output: 1.0.11

Output of pd.show_versions():

``` Please run pd.show_versions() and paste the results here. You can do this with `python -c 'import pandas as pd; pd.show_versions()'` python3 -c 'import pandas as pd; pd.show_versions()' INSTALLED VERSIONS ------------------ commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.7.3.final.0 python-bits : 64 OS : Linux OS-release : 4.19.0-23-amd64 Version : #1 SMP Debian 4.19.269-1 (2022-12-20) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.3.5 numpy : 1.21.6 pytz : 2023.3 dateutil : 2.7.3 pip : 23.1.2 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 5.8.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None ```

Output of sessionInfo():

``` Please run sessionInfo() and paste the results here. You can do this with `R -e 'library(phateR); sessionInfo()'` R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRblas.so LAPACK: /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phateR_1.0.7 Matrix_1.6-0 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 magrittr_2.0.3 tidyselect_1.2.0 munsell_0.5.0 [5] colorspace_2.1-0 lattice_0.21-8 R6_2.5.1 rlang_1.1.1 [9] fastmap_1.1.1 fansi_1.0.4 dplyr_1.1.2 grid_4.2.2 [13] gtable_0.3.3 png_0.1-8 utf8_1.2.3 cli_3.6.1 [17] tibble_3.2.1 lifecycle_1.0.3 ggplot2_3.4.2 vctrs_0.6.2 [21] memoise_2.0.1 glue_1.6.2 cachem_1.0.8 compiler_4.2.2 [25] pillar_1.9.0 generics_0.1.3 scales_1.2.1 reticulate_1.28 [29] jsonlite_1.8.7 pkgconfig_2.0.3 ```

Output of reticulate::py_discover_config(required_module = "phate"):

``` Please run `reticulate::py_discover_config(required_module = "phate")` and paste the results here. You can do this with `R -e 'reticulate::py_discover_config(required_module = "phate")'` python: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate/bin/python libpython: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.8.so pythonhome: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate:/mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] numpy: /mnt/mfs/hgrcgrid/homes/fp2409/.local/lib/python3.8/site-packages/numpy numpy_version: 1.24.4 ```

Output of phateR::check_pyphate_version():

``` Please run `phateR::check_pyphate_version()` and paste the results here. You can do this with `R -e 'phateR::check_pyphate_version()'` TRUE ```

Additional context I do find it strange that phate is not appearing when I type in reticulate::py_discover_config(required_module = "phate" however I have been running phate on another count matrix of similar size (45,809 cells and 21,820 genes) so I know it is not a memory issue or that phate versions are not working correctly. I do however need to run those first two lines code after importing packages in R for phate to work. I also checked if there were potentially some samples that had all zero counts but this was not the case. I did just realize that some genes may have no counts detected (~3,000) in all samples but this was the case as well for the other object but phate ran perfectly on it as well. Would greatly appreciate your help on this wonderful package!

library(phateR)
library(magrittr)

#Need to run this 
reticulate::py_discover_config(required_module="phate")
reticulate::import("phate")

micro_seurat_mf_cts <- readRDS("micro_seurat_mf_cts.rds")
micro_seurat_mf_meta <- readRDS("micro_seurat_mf_meta.rds")

mat <- library.size.normalize(t(micro_seurat_mf_cts), verbose = T)
mat <- sqrt(mat)
donor <- micro_seurat_mf_meta$projid %>% as.factor() %>% as.numeric()
mat_phate <- phate(as.matrix(mat), sample_idx = donor)

Add any other context about the problem here.

scottgigante commented 1 year ago

What is dim(mat)? By the warning message in your logs it looks like n_samples is 1.

Cannot set knn (5) to be greater than n_samples - 2 (-1). Setting knn=-1

On Thu, 20 July 2023, 18:06 fparyani, @.***> wrote:

Describe the bug I am running the phate function with donor correction on a counts matrix that contains 34,017 cells and 21,820 genes. The whole pipeline works fine and the phate function begins to run but then suddenly crashes.

Actual behavior The warning and error from phate.

/home/fp2409/.local/lib/python3.8/site-packages/graphtools/graphs.py:108: UserWarning: Cannot set knn (5) to be greater than n_samples - 2 (-1). Setting knn=-1 warnings.warn( Calculating KNN search... /home/fp2409/.local/lib/python3.8/site-packages/graphtools/graphs.py:236: UserWarning: Metric euclidean not valid for sklearn.neighbors.BallTree. Graph instantiation may be slower than normal. warnings.warn( Calculated subgraphs in 2.01 seconds. Calculated graph and diffusion operator in 73.14 seconds. Calculated PHATE in 73.15 seconds. Error in py_call_impl(callable, dots$args, dots$keywords) : sklearn.utils._param_validation.InvalidParameterError: The 'n_neighbors' parameter of NearestNeighbors must be an int in the range [1, inf) or None. Got 0 instead. In addition: Warning message: In asMethod(object) : sparse->dense coercion: allocating vector of size 5.5 GiB

System information:

Output of phate.version:

Please run phate.version and paste the results here.

You can do this with python -c 'import phate; print(phate.__version__)'

Input: python3 -c 'import phate; print(phate.version)' Output: 1.0.11

Output of pd.show_versions():

Please run pd.show_versions() and paste the results here.

You can do this with python -c 'import pandas as pd; pd.show_versions()'

python3 -c 'import pandas as pd; pd.show_versions()'

INSTALLED VERSIONS

commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.7.3.final.0 python-bits : 64 OS : Linux OS-release : 4.19.0-23-amd64 Version : #1 SMP Debian 4.19.269-1 (2022-12-20) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.5 numpy : 1.21.6 pytz : 2023.3 dateutil : 2.7.3 pip : 23.1.2 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 5.8.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Output of sessionInfo():

Please run sessionInfo() and paste the results here.

You can do this with R -e 'library(phateR); sessionInfo()'

R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster)

Matrix products: default BLAS: /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRblas.so LAPACK: /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] phateR_1.0.7 Matrix_1.6-0

loaded via a namespace (and not attached): [1] Rcpp_1.0.10 magrittr_2.0.3 tidyselect_1.2.0 munsell_0.5.0 [5] colorspace_2.1-0 lattice_0.21-8 R6_2.5.1 rlang_1.1.1 [9] fastmap_1.1.1 fansi_1.0.4 dplyr_1.1.2 grid_4.2.2 [13] gtable_0.3.3 png_0.1-8 utf8_1.2.3 cli_3.6.1 [17] tibble_3.2.1 lifecycle_1.0.3 ggplot2_3.4.2 vctrs_0.6.2 [21] memoise_2.0.1 glue_1.6.2 cachem_1.0.8 compiler_4.2.2 [25] pillar_1.9.0 generics_0.1.3 scales_1.2.1 reticulate_1.28 [29] jsonlite_1.8.7 pkgconfig_2.0.3

Output of reticulate::py_discover_config(required_module = "phate"):

Please run reticulate::py_discover_config(required_module = "phate") and paste the results here.

You can do this with R -e 'reticulate::py_discover_config(required_module = "phate")'

python: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate/bin/python libpython: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.8.so pythonhome: /mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate:/mnt/mfs/hgrcgrid/homes/fp2409/.local/share/r-miniconda/envs/r-reticulate version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] numpy: /mnt/mfs/hgrcgrid/homes/fp2409/.local/lib/python3.8/site-packages/numpy numpy_version: 1.24.4

Output of phateR::check_pyphate_version():

Please run phateR::check_pyphate_version() and paste the results here.

You can do this with R -e 'phateR::check_pyphate_version()'

TRUE

Additional context I do find it strange that phate is not appearing when I type in reticulate::py_discover_config(required_module = "phate" however I have been running phate on another count matrix of similar size (45,809 cells and 21,820 genes) so I know it is not a memory issue or that phate versions are not working correctly. I do however need to run those first two lines code after importing packages in R for phate to work. I also checked if there were potentially some samples that had all zero counts but this was not the case. I did just realize that some genes may have no counts detected (~3,000) in all samples but this was the case as well for the other object but phate ran perfectly on it as well. Would greatly appreciate your help on this wonderful package!

library(phateR) library(magrittr)

Need to run this

reticulate::py_discover_config(required_module="phate") reticulate::import("phate")

micro_seurat_mf_cts <- readRDS("micro_seurat_mf_cts.rds") micro_seurat_mf_meta <- readRDS("micro_seurat_mf_meta.rds")

mat <- library.size.normalize(t(micro_seurat_mf_cts), verbose = T) mat <- sqrt(mat) donor <- micro_seurat_mf_meta$projid %>% as.factor() %>% as.numeric() mat_phate <- phate(as.matrix(mat), sample_idx = donor)

Add any other context about the problem here.

— Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/phateR/issues/69, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA3DX34XSGTNWGPBVETVUTXRGTXNANCNFSM6AAAAAA2SBAGUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fparyani commented 1 year ago

dim(mat) returns 34,017 21,820. It seems perfectly fine

scottgigante commented 1 year ago

Rereading your code, it might be the number of samples per donor. Can you run dplyr::count(micro_seurat_mf_meta, projid)? If there are any donors with less than 3 samples you might have to remove them.

On Fri, 21 July 2023, 10:25 fparyani, @.***> wrote:

dim(mat) returns 34,017 21,820. It seems perfectly fine

— Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/phateR/issues/69#issuecomment-1645677939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA3DX6YOWLNN427HKPZP4TXRKGOZANCNFSM6AAAAAA2SBAGUA . You are receiving this because you commented.Message ID: @.***>

fparyani commented 1 year ago

Hey Scott, I found a donor that had 1 sample. This fixed the issue. I appreciate the help!