aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
446 stars 183 forks source link

[BUG] run in pyscenic grn #559

Open zhangdong360 opened 4 months ago

zhangdong360 commented 4 months ago

Describe the bug When I run pySCENIC, I often encounter disturbing warnings. I checked the problem may be associated with me this question. https://github.com/aertslab/pySCENIC/issues/482 But I'm not using port 8787. On the other hand, I don't often encounter this warning on the HPC where I have Rstudio server installed, and I don't think it has anything to do with it. I think the problem might be with dask, but I'm not well versed in it. On the other hand, the lack of output, which makes me cannot judge whether I need to run the program. As mentioned above, re-running the program will most likely encounter warning again. In addition, I have tried arboreto_with_multiprocessing.py, but it was too inefficient, I tested it on small samples, and it was nearly twice as slow as pySCENIC for the same number of CPU cores. I don't think that's acceptable in a large sample. It took me too much energy in to run the program, I have to sample cut to my data size, but I don't think this is a long-term solution.

(scanpy) [zhangdong_2@jupyterlab-md-npeer5o1 pySCENIC]$ cat step1.out 

2024-07-02 15:30:29,526 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2024-07-02 15:32:35,579 - pyscenic.cli.pyscenic - INFO - Inferring regulatory networks.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
2024-07-02 15:34:13,518 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']})
2024-07-02 15:34:13,518 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']})
2024-07-02 15:34:13,519 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:33897 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827')}
2024-07-02 15:34:13,520 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:33279 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827')}
2024-07-02 15:34:13,530 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827']})
2024-07-02 15:34:13,531 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:33285 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:46641', 'tcp://127.0.0.1:43849', 'tcp://127.0.0.1:44125', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:37827')}
2024-07-02 15:34:28,077 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849']})
2024-07-02 15:34:28,079 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:46641 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849')}
2024-07-02 15:34:28,086 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849']})
2024-07-02 15:34:28,088 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:33285 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:32843', 'tcp://127.0.0.1:32855', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:45323', 'tcp://127.0.0.1:46849')}
2024-07-02 15:37:24,939 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']})
2024-07-02 15:37:24,942 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']})
2024-07-02 15:37:24,943 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']})
2024-07-02 15:37:24,972 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:44125 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849')}
2024-07-02 15:37:24,973 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:43849 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849')}
2024-07-02 15:37:24,973 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:42931 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:32843', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:42003', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849')}
2024-07-02 15:37:40,544 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']})
2024-07-02 15:37:40,545 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:42931 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285')}
2024-07-02 15:37:40,562 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']})
2024-07-02 15:37:40,564 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285']})
2024-07-02 15:37:40,564 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:44125 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285')}
2024-07-02 15:37:40,565 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:37827 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:45323', 'tcp://127.0.0.1:44201', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33285')}
2024-07-02 15:39:52,768 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:33285', 'tcp://127.0.0.1:37827', 'tcp://127.0.0.1:46849', 'tcp://127.0.0.1:42931']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:46641', 'tcp://127.0.0.1:33285', 'tcp://127.0.0.1:37827', 'tcp://127.0.0.1:46849', 'tcp://127.0.0.1:42931']})
2024-07-02 15:39:52,816 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:32855 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:46641', 'tcp://127.0.0.1:33285', 'tcp://127.0.0.1:37827', 'tcp://127.0.0.1:46849', 'tcp://127.0.0.1:42931')}
2024-07-02 15:40:37,290 - distributed.worker - WARNING - Could not find data: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33897', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']} on workers: [] (who_has: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ['tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33897', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849']})
2024-07-02 15:40:37,291 - distributed.scheduler - WARNING - Worker tcp://127.0.0.1:44201 failed to acquire keys: {'ndarray-a612cf0abd06497fa68e9db39636fedb': ('tcp://127.0.0.1:35067', 'tcp://127.0.0.1:33897', 'tcp://127.0.0.1:33279', 'tcp://127.0.0.1:46641', 'tcp://127.0.0.1:44847', 'tcp://127.0.0.1:46849')}

Expected behavior I didn't find a clear reproduction. But I find it often will appear in my after a run.

zhangdong360 commented 4 months ago

My code:

#!/bin/bash
#SBATCH -o output/pyscenic_hsc_sev.out
#SBATCH -e output/pyscenic_hsc_sev.err
#SBATCH --partition=compute
#SBATCH -J scenic_HSC_SEV
#SBATCH --nodes=1               
#SBATCH -n 30
# This is for fastp protocol

#conda activate scanpy
# human
#f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
#f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl"
#f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt"
# mouse
f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl"
f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt"
# data input

dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/"
input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom"

# step1
echo "Step 1 pyscenic grn start"
nohup pyscenic grn ${input_loom}  ${f_tf_list} \
              --seed 21 \
              --num_workers 16 \
              --method grnboost2 \
              --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 &
echo "Step 1 pyscenic grn finish"
echo "Step 2 pyscenic ctx start"
nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv  \
     ${f_db_names} \
     --annotations_fname ${f_motif_path} \
     --expression_mtx_fname ${input_loom} \
     --output ${dir_result}/step_2_reg.csv \
     --mask_dropouts \
     --num_workers 16 >step2.out 2>&1 &
echo "Step 2 pyscenic ctx finish"
echo "Step 3 pyscenic aucell start"
pyscenic aucell \
    ${input_loom} \
    ${dir_result}/step_2_reg.csv \
    --seed 21 \
    --output ${dir_result}/step_3_aucell.csv \
    --num_workers 16 >step_3.out 2>&1 &
echo "All finish"

I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the ndarray - a612cf0abd06497fa68e9db39636fedb. May be some help to found the problem?

zhangdong360 commented 4 months ago

My code:

#!/bin/bash
#SBATCH -o output/pyscenic_hsc_sev.out
#SBATCH -e output/pyscenic_hsc_sev.err
#SBATCH --partition=compute
#SBATCH -J scenic_HSC_SEV
#SBATCH --nodes=1               
#SBATCH -n 30
# This is for fastp protocol

#conda activate scanpy
# human
#f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
#f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl"
#f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt"
# mouse
f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl"
f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt"
# data input

dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/"
input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom"

# step1
echo "Step 1 pyscenic grn start"
nohup pyscenic grn ${input_loom}  ${f_tf_list} \
              --seed 21 \
              --num_workers 16 \
              --method grnboost2 \
              --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 &
echo "Step 1 pyscenic grn finish"
echo "Step 2 pyscenic ctx start"
nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv  \
     ${f_db_names} \
     --annotations_fname ${f_motif_path} \
     --expression_mtx_fname ${input_loom} \
     --output ${dir_result}/step_2_reg.csv \
     --mask_dropouts \
     --num_workers 16 >step2.out 2>&1 &
echo "Step 2 pyscenic ctx finish"
echo "Step 3 pyscenic aucell start"
pyscenic aucell \
    ${input_loom} \
    ${dir_result}/step_2_reg.csv \
    --seed 21 \
    --output ${dir_result}/step_3_aucell.csv \
    --num_workers 16 >step_3.out 2>&1 &
echo "All finish"

I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the ndarray - a612cf0abd06497fa68e9db39636fedb. May be some help to found the problem?

NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.

ghuls commented 4 months ago

Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

JustMoveOnnn commented 4 months ago

My code:

#!/bin/bash
#SBATCH -o output/pyscenic_hsc_sev.out
#SBATCH -e output/pyscenic_hsc_sev.err
#SBATCH --partition=compute
#SBATCH -J scenic_HSC_SEV
#SBATCH --nodes=1               
#SBATCH -n 30
# This is for fastp protocol

#conda activate scanpy
# human
#f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
#f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl"
#f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt"
# mouse
f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl"
f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt"
# data input

dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/"
input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom"

# step1
echo "Step 1 pyscenic grn start"
nohup pyscenic grn ${input_loom}  ${f_tf_list} \
              --seed 21 \
              --num_workers 16 \
              --method grnboost2 \
              --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 &
echo "Step 1 pyscenic grn finish"
echo "Step 2 pyscenic ctx start"
nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv  \
     ${f_db_names} \
     --annotations_fname ${f_motif_path} \
     --expression_mtx_fname ${input_loom} \
     --output ${dir_result}/step_2_reg.csv \
     --mask_dropouts \
     --num_workers 16 >step2.out 2>&1 &
echo "Step 2 pyscenic ctx finish"
echo "Step 3 pyscenic aucell start"
pyscenic aucell \
    ${input_loom} \
    ${dir_result}/step_2_reg.csv \
    --seed 21 \
    --output ${dir_result}/step_3_aucell.csv \
    --num_workers 16 >step_3.out 2>&1 &
echo "All finish"

I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the ndarray - a612cf0abd06497fa68e9db39636fedb. May be some help to found the problem?

NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.

Totally same with your issue.

zhangdong360 commented 4 months ago

I rechecked the environment, probably because my conda environment was copied directly from another linux platform. In the first line of the python package, the python reference location needs to be updated. After I set up the new conda environment and ran it again, it went back to normal. But until I fix this problem, the small sample dataset still works fine, which makes me overlook possible problems with the environment configuration. Runs only pip install pyscenic will be dependent on the version problem. And here I can share my configuration method of conda environment. You need to run:

pip install numpy==1.22.4
pip install numexpr==2.8.4
pip install distributed==2023.12.1
pip install dask-expr==0.5.3
pip install dask==2023.12.1

And now you can run pyscenic successfully! By the way, We can open the dask dashboard to monitor the progress and memory usage of the task. Its default opened the way for the local IP: 8787. If port 8787 is occupied, the new port that the dask use will tip in the output. 微信图片_20240708161213

zhangdong360 commented 4 months ago

Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

Sorry, I don't have root access on our server and it's hard to use docker.

zhangdong360 commented 4 months ago

My code:

#!/bin/bash
#SBATCH -o output/pyscenic_hsc_sev.out
#SBATCH -e output/pyscenic_hsc_sev.err
#SBATCH --partition=compute
#SBATCH -J scenic_HSC_SEV
#SBATCH --nodes=1               
#SBATCH -n 30
# This is for fastp protocol

#conda activate scanpy
# human
#f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
#f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl"
#f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt"
# mouse
f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl"
f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt"
# data input

dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/"
input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom"

# step1
echo "Step 1 pyscenic grn start"
nohup pyscenic grn ${input_loom}  ${f_tf_list} \
              --seed 21 \
              --num_workers 16 \
              --method grnboost2 \
              --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 &
echo "Step 1 pyscenic grn finish"
echo "Step 2 pyscenic ctx start"
nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv  \
     ${f_db_names} \
     --annotations_fname ${f_motif_path} \
     --expression_mtx_fname ${input_loom} \
     --output ${dir_result}/step_2_reg.csv \
     --mask_dropouts \
     --num_workers 16 >step2.out 2>&1 &
echo "Step 2 pyscenic ctx finish"
echo "Step 3 pyscenic aucell start"
pyscenic aucell \
    ${input_loom} \
    ${dir_result}/step_2_reg.csv \
    --seed 21 \
    --output ${dir_result}/step_3_aucell.csv \
    --num_workers 16 >step_3.out 2>&1 &
echo "All finish"

I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the ndarray - a612cf0abd06497fa68e9db39636fedb. May be some help to found the problem?

NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.

Totally same with your issue.

Maybe you can try reinstalling the environment as I did? my conda environment:

# packages in environment at /share/home/zhangd/.conda/envs/pyscenic:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex             5.1                       1_gnu    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
aiohttp                   3.9.5                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
arboreto                  0.1.6                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
bokeh                     3.4.2                    pypi_0    pypi
boltons                   24.0.0                   pypi_0    pypi
ca-certificates           2024.3.11            h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2024.7.4                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
contourpy                 1.2.1                    pypi_0    pypi
ctxcore                   0.2.0                    pypi_0    pypi
cytoolz                   0.12.3                   pypi_0    pypi
dask                      2023.12.1                pypi_0    pypi
dask-expr                 0.5.3                    pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
distributed               2023.12.1                pypi_0    pypi
frozendict                2.4.4                    pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.6.1                 pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        8.0.0                    pypi_0    pypi
interlap                  0.2.7                    pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi                    3.3                  he6710b0_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng                 11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp                   11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng              11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llvmlite                  0.43.0                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
loompy                    3.0.7                    pypi_0    pypi
lz4                       4.3.3                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
msgpack                   1.0.8                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
multiprocessing-on-dill   3.5.0a4                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx                  3.2.1                    pypi_0    pypi
numba                     0.60.0                   pypi_0    pypi
numexpr                   2.8.4                    pypi_0    pypi
numpy                     1.22.4                   pypi_0    pypi
numpy-groupies            0.11.1                   pypi_0    pypi
openssl                   1.1.1w               h7f8727e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
partd                     1.4.2                    pypi_0    pypi
pillow                    10.4.0                   pypi_0    pypi
pip                       24.0             py39h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
psutil                    6.0.0                    pypi_0    pypi
pyarrow                   16.1.0                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pynndescent               0.5.13                   pypi_0    pypi
pyscenic                  0.12.1                   pypi_0    pypi
python                    3.9.12               h12debd9_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests                  2.32.3                   pypi_0    pypi
scikit-learn              1.5.1                    pypi_0    pypi
scipy                     1.13.1                   pypi_0    pypi
setuptools                69.5.1           py39h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six                       1.16.0                   pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
sqlite                    3.45.3               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tblib                     3.0.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tk                        8.6.14               h39e8969_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
toolz                     0.12.1                   pypi_0    pypi
tornado                   6.4.1                    pypi_0    pypi
tqdm                      4.66.4                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
umap-learn                0.5.6                    pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
wheel                     0.43.0           py39h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xyzservices               2024.6.0                 pypi_0    pypi
xz                        5.4.6                h5eee18b_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yarl                      1.9.4                    pypi_0    pypi
zict                      3.0.0                    pypi_0    pypi
zipp                      3.19.2                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ghuls commented 3 months ago

Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

Sorry, I don't have root access on our server and it's hard to use docker.

@zhangdong360 Could you try this? I recently found dockerc which allows to create a binary from a docker image and which does not require root access for running the final binary. If it works well, it would be a good alternative for apptainer/singulariy/docker/podman for HPC systems that don't have any of them installed.

# Download pyscenic binary made with dockerc from pyscenic docker image.
wget https://resources.aertslab.org/cistarget/pyscenic_0.12.1

# Make pySCENIC binary executable
chmod a+x pyscenic_0.12.1.

# Start bash in pySCENIC executable and mount local data path in container as in normal Docker:
#   https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
/pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")'

Inside this bash, you should be able to run pySCENIC now:

$ /pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")'
unknown argument ignored: lazytime
root@umoci-default:/# pyscenic
usage: pyscenic [-h] {grn,add_cor,ctx,aucell} ...

Single-Cell rEgulatory Network Inference and Clustering
(0.12.1+0.gce41b61.dirty)

positional arguments:
  {grn,add_cor,ctx,aucell}
                        sub-command help
    grn                 Derive co-expression modules from expression matrix.
    add_cor             [Optional] Add Pearson correlations based on TF-gene
                        expression to the network adjacencies output from the
                        GRN step, and output these to a new adjacencies file.
                        This will normally be done during the "ctx" step.
    ctx                 Find enriched motifs for a gene signature and
                        optionally prune targets from this signature based on
                        cis-regulatory cues.
    aucell              Quantify activity of gene signatures across single
                        cells.

options:
  -h, --help            show this help message and exit

Arguments can be read from file using a @args.txt construct. For more
information on loom file format see http://loompy.org . For more information
on gmt file format see https://software.broadinstitute.org/cancer/software/gse
a/wiki/index.php/Data_formats .
Flu09 commented 1 month ago

@ghuls [mo@lm02-16 test_pyscenic]$ /pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")' bash: /pyscenic_0.12.1: No such file or directory [mo@lm02-16 test_pyscenic]$ ls pyscenic_0.12.1

it did not work for me. Also using the singularity image did not work. it stops at creating dask graph and keeps running forever

sherinesaber commented 1 month ago

Hello @ghuls I converted the docker image into a singularity image. Dask used to work fine (i test on 100k loom file) then now it works only if few thousands cells (maximum 10k and low numbers of workers say 20) . Like Flu90 mentioned it prints creating Dask graph and does not proceed. I am working on HPC so i can request high number of workers maximim 40 and 3T of RAM.

lidoctor commented 1 month ago

The log contains multiple warnings from the Numba library, specifically regarding the usage of the nopython argument. This warning is not critical to the execution of your code but may affect performance in future versions. The distributed.scheduler is the scheduler in Dask, responsible for coordinating distributed computing tasks, allocating worker nodes to execute tasks, and managing data distribution. Dask has a certain level of fault tolerance and will attempt to reassign tasks and data to other nodes, but this depends on data availability and cluster configuration.

You can try reducing the level of parallelism (by lowering --num_workers) to ease the load on the nodes.