Open zhangdong360 opened 4 months ago
My code:
#!/bin/bash
#SBATCH -o output/pyscenic_hsc_sev.out
#SBATCH -e output/pyscenic_hsc_sev.err
#SBATCH --partition=compute
#SBATCH -J scenic_HSC_SEV
#SBATCH --nodes=1
#SBATCH -n 30
# This is for fastp protocol
#conda activate scanpy
# human
#f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
#f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl"
#f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt"
# mouse
f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl"
f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt"
# data input
dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/"
input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom"
# step1
echo "Step 1 pyscenic grn start"
nohup pyscenic grn ${input_loom} ${f_tf_list} \
--seed 21 \
--num_workers 16 \
--method grnboost2 \
--output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 &
echo "Step 1 pyscenic grn finish"
echo "Step 2 pyscenic ctx start"
nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv \
${f_db_names} \
--annotations_fname ${f_motif_path} \
--expression_mtx_fname ${input_loom} \
--output ${dir_result}/step_2_reg.csv \
--mask_dropouts \
--num_workers 16 >step2.out 2>&1 &
echo "Step 2 pyscenic ctx finish"
echo "Step 3 pyscenic aucell start"
pyscenic aucell \
${input_loom} \
${dir_result}/step_2_reg.csv \
--seed 21 \
--output ${dir_result}/step_3_aucell.csv \
--num_workers 16 >step_3.out 2>&1 &
echo "All finish"
I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same.
At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the ndarray - a612cf0abd06497fa68e9db39636fedb
.
May be some help to found the problem?
My code:
#!/bin/bash #SBATCH -o output/pyscenic_hsc_sev.out #SBATCH -e output/pyscenic_hsc_sev.err #SBATCH --partition=compute #SBATCH -J scenic_HSC_SEV #SBATCH --nodes=1 #SBATCH -n 30 # This is for fastp protocol #conda activate scanpy # human #f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" #f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl" #f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt" # mouse f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl" f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt" # data input dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/" input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom" # step1 echo "Step 1 pyscenic grn start" nohup pyscenic grn ${input_loom} ${f_tf_list} \ --seed 21 \ --num_workers 16 \ --method grnboost2 \ --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 & echo "Step 1 pyscenic grn finish" echo "Step 2 pyscenic ctx start" nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv \ ${f_db_names} \ --annotations_fname ${f_motif_path} \ --expression_mtx_fname ${input_loom} \ --output ${dir_result}/step_2_reg.csv \ --mask_dropouts \ --num_workers 16 >step2.out 2>&1 & echo "Step 2 pyscenic ctx finish" echo "Step 3 pyscenic aucell start" pyscenic aucell \ ${input_loom} \ ${dir_result}/step_2_reg.csv \ --seed 21 \ --output ${dir_result}/step_3_aucell.csv \ --num_workers 16 >step_3.out 2>&1 & echo "All finish"
I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the
ndarray - a612cf0abd06497fa68e9db39636fedb
. May be some help to found the problem?
NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.
Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
My code:
#!/bin/bash #SBATCH -o output/pyscenic_hsc_sev.out #SBATCH -e output/pyscenic_hsc_sev.err #SBATCH --partition=compute #SBATCH -J scenic_HSC_SEV #SBATCH --nodes=1 #SBATCH -n 30 # This is for fastp protocol #conda activate scanpy # human #f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" #f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl" #f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt" # mouse f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl" f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt" # data input dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/" input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom" # step1 echo "Step 1 pyscenic grn start" nohup pyscenic grn ${input_loom} ${f_tf_list} \ --seed 21 \ --num_workers 16 \ --method grnboost2 \ --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 & echo "Step 1 pyscenic grn finish" echo "Step 2 pyscenic ctx start" nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv \ ${f_db_names} \ --annotations_fname ${f_motif_path} \ --expression_mtx_fname ${input_loom} \ --output ${dir_result}/step_2_reg.csv \ --mask_dropouts \ --num_workers 16 >step2.out 2>&1 & echo "Step 2 pyscenic ctx finish" echo "Step 3 pyscenic aucell start" pyscenic aucell \ ${input_loom} \ ${dir_result}/step_2_reg.csv \ --seed 21 \ --output ${dir_result}/step_3_aucell.csv \ --num_workers 16 >step_3.out 2>&1 & echo "All finish"
I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the
ndarray - a612cf0abd06497fa68e9db39636fedb
. May be some help to found the problem?NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.
Totally same with your issue.
I rechecked the environment, probably because my conda environment was copied directly from another linux platform. In the first line of the python package, the python reference location needs to be updated. After I set up the new conda environment and ran it again, it went back to normal.
But until I fix this problem, the small sample dataset still works fine, which makes me overlook possible problems with the environment configuration.
Runs only pip install pyscenic
will be dependent on the version problem. And here I can share my configuration method of conda environment. You need to run:
pip install numpy==1.22.4
pip install numexpr==2.8.4
pip install distributed==2023.12.1
pip install dask-expr==0.5.3
pip install dask==2023.12.1
And now you can run pyscenic successfully!
By the way, We can open the dask dashboard to monitor the progress and memory usage of the task. Its default opened the way for the local IP: 8787
. If port 8787 is occupied, the new port that the dask use will tip in the output.
Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
Sorry, I don't have root access on our server and it's hard to use docker.
My code:
#!/bin/bash #SBATCH -o output/pyscenic_hsc_sev.out #SBATCH -e output/pyscenic_hsc_sev.err #SBATCH --partition=compute #SBATCH -J scenic_HSC_SEV #SBATCH --nodes=1 #SBATCH -n 30 # This is for fastp protocol #conda activate scanpy # human #f_db_names="/share/home/zhangd/tools/database/cistarget/cisTarget_databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" #f_motif_path="/share/home/zhangd/tools/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl" #f_tf_list="/share/home/zhangd/project/python_project/pySCENIC/allTFs_hg38.txt" # mouse f_db_names="/home/zhangdong_2/database/cistarget/cisTarget_databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" f_motif_path="/home/zhangdong_2/database/cistarget/Motif2TF/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl" f_tf_list="/home/zhangdong_2/database/cistarget/TF_lists/allTFs_mm.txt" # data input dir_result="/home/zhangdong_2/project/pySCENIC/03_result/HSC_SEV/" input_loom="/home/zhangdong_2/project/pySCENIC/01_data/HSC_SEV.loom" # step1 echo "Step 1 pyscenic grn start" nohup pyscenic grn ${input_loom} ${f_tf_list} \ --seed 21 \ --num_workers 16 \ --method grnboost2 \ --output ${dir_result}/step_1_fibo_grn.tsv >step1.out 2>&1 & echo "Step 1 pyscenic grn finish" echo "Step 2 pyscenic ctx start" nohup pyscenic ctx ${dir_result}/step_1_fibo_grn.tsv \ ${f_db_names} \ --annotations_fname ${f_motif_path} \ --expression_mtx_fname ${input_loom} \ --output ${dir_result}/step_2_reg.csv \ --mask_dropouts \ --num_workers 16 >step2.out 2>&1 & echo "Step 2 pyscenic ctx finish" echo "Step 3 pyscenic aucell start" pyscenic aucell \ ${input_loom} \ ${dir_result}/step_2_reg.csv \ --seed 21 \ --output ${dir_result}/step_3_aucell.csv \ --num_workers 16 >step_3.out 2>&1 & echo "All finish"
I tried to use slurm distribution to compute nodes and directly in local bash operation, the result is the same. At the same time I run this code on the other operation platform, I found warning almost consistent, all point to the
ndarray - a612cf0abd06497fa68e9db39636fedb
. May be some help to found the problem?NOTE:It is worth mentioning that this is a different data on different platforms have been the same warning…And my sample book set can be successful operation.This doubt has been gnawed at me for a long time.
Totally same with your issue.
Maybe you can try reinstalling the environment as I did? my conda environment:
# packages in environment at /share/home/zhangd/.conda/envs/pyscenic:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
aiohttp 3.9.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
arboreto 0.1.6 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
bokeh 3.4.2 pypi_0 pypi
boltons 24.0.0 pypi_0 pypi
ca-certificates 2024.3.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2024.7.4 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
cloudpickle 3.0.0 pypi_0 pypi
contourpy 1.2.1 pypi_0 pypi
ctxcore 0.2.0 pypi_0 pypi
cytoolz 0.12.3 pypi_0 pypi
dask 2023.12.1 pypi_0 pypi
dask-expr 0.5.3 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
distributed 2023.12.1 pypi_0 pypi
frozendict 2.4.4 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
h5py 3.11.0 pypi_0 pypi
idna 3.7 pypi_0 pypi
importlib-metadata 8.0.0 pypi_0 pypi
interlap 0.2.7 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.3 he6710b0_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llvmlite 0.43.0 pypi_0 pypi
locket 1.0.0 pypi_0 pypi
loompy 3.0.7 pypi_0 pypi
lz4 4.3.3 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
msgpack 1.0.8 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
multiprocessing-on-dill 3.5.0a4 pypi_0 pypi
ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx 3.2.1 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
numexpr 2.8.4 pypi_0 pypi
numpy 1.22.4 pypi_0 pypi
numpy-groupies 0.11.1 pypi_0 pypi
openssl 1.1.1w h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
partd 1.4.2 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.0 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
psutil 6.0.0 pypi_0 pypi
pyarrow 16.1.0 pypi_0 pypi
pyarrow-hotfix 0.6 pypi_0 pypi
pynndescent 0.5.13 pypi_0 pypi
pyscenic 0.12.1 pypi_0 pypi
python 3.9.12 h12debd9_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.9.0.post0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests 2.32.3 pypi_0 pypi
scikit-learn 1.5.1 pypi_0 pypi
scipy 1.13.1 pypi_0 pypi
setuptools 69.5.1 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six 1.16.0 pypi_0 pypi
sortedcontainers 2.4.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tblib 3.0.0 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tk 8.6.14 h39e8969_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
toolz 0.12.1 pypi_0 pypi
tornado 6.4.1 pypi_0 pypi
tqdm 4.66.4 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
umap-learn 0.5.6 pypi_0 pypi
urllib3 2.2.2 pypi_0 pypi
wheel 0.43.0 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xyzservices 2024.6.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yarl 1.9.4 pypi_0 pypi
zict 3.0.0 pypi_0 pypi
zipp 3.19.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
Can you with try with the Docker/Podman/Singularity/Apptainer images instead? https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
Sorry, I don't have root access on our server and it's hard to use docker.
@zhangdong360 Could you try this? I recently found dockerc
which allows to create a binary from a docker image and which does not require root access for running the final binary. If it works well, it would be a good alternative for apptainer/singulariy/docker/podman for HPC systems that don't have any of them installed.
# Download pyscenic binary made with dockerc from pyscenic docker image.
wget https://resources.aertslab.org/cistarget/pyscenic_0.12.1
# Make pySCENIC binary executable
chmod a+x pyscenic_0.12.1.
# Start bash in pySCENIC executable and mount local data path in container as in normal Docker:
# https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
/pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")'
Inside this bash, you should be able to run pySCENIC now:
$ /pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")'
unknown argument ignored: lazytime
root@umoci-default:/# pyscenic
usage: pyscenic [-h] {grn,add_cor,ctx,aucell} ...
Single-Cell rEgulatory Network Inference and Clustering
(0.12.1+0.gce41b61.dirty)
positional arguments:
{grn,add_cor,ctx,aucell}
sub-command help
grn Derive co-expression modules from expression matrix.
add_cor [Optional] Add Pearson correlations based on TF-gene
expression to the network adjacencies output from the
GRN step, and output these to a new adjacencies file.
This will normally be done during the "ctx" step.
ctx Find enriched motifs for a gene signature and
optionally prune targets from this signature based on
cis-regulatory cues.
aucell Quantify activity of gene signatures across single
cells.
options:
-h, --help show this help message and exit
Arguments can be read from file using a @args.txt construct. For more
information on loom file format see http://loompy.org . For more information
on gmt file format see https://software.broadinstitute.org/cancer/software/gse
a/wiki/index.php/Data_formats .
@ghuls [mo@lm02-16 test_pyscenic]$ /pyscenic_0.12.1 -v /data:/data -c 'import os; os.environ["COLUMNS"] = "80"; os.system("bash")' bash: /pyscenic_0.12.1: No such file or directory [mo@lm02-16 test_pyscenic]$ ls pyscenic_0.12.1
it did not work for me. Also using the singularity image did not work. it stops at creating dask graph and keeps running forever
Hello @ghuls I converted the docker image into a singularity image. Dask used to work fine (i test on 100k loom file) then now it works only if few thousands cells (maximum 10k and low numbers of workers say 20) . Like Flu90 mentioned it prints creating Dask graph and does not proceed. I am working on HPC so i can request high number of workers maximim 40 and 3T of RAM.
The log contains multiple warnings from the Numba library, specifically regarding the usage of the nopython argument. This warning is not critical to the execution of your code but may affect performance in future versions. The distributed.scheduler is the scheduler in Dask, responsible for coordinating distributed computing tasks, allocating worker nodes to execute tasks, and managing data distribution. Dask has a certain level of fault tolerance and will attempt to reassign tasks and data to other nodes, but this depends on data availability and cluster configuration.
You can try reducing the level of parallelism (by lowering --num_workers
) to ease the load on the nodes.
Describe the bug When I run pySCENIC, I often encounter disturbing warnings. I checked the problem may be associated with me this question. https://github.com/aertslab/pySCENIC/issues/482 But I'm not using port 8787. On the other hand, I don't often encounter this warning on the HPC where I have Rstudio server installed, and I don't think it has anything to do with it. I think the problem might be with dask, but I'm not well versed in it. On the other hand, the lack of output, which makes me cannot judge whether I need to run the program. As mentioned above, re-running the program will most likely encounter warning again. In addition, I have tried arboreto_with_multiprocessing.py, but it was too inefficient, I tested it on small samples, and it was nearly twice as slow as pySCENIC for the same number of CPU cores. I don't think that's acceptable in a large sample. It took me too much energy in to run the program, I have to sample cut to my data size, but I don't think this is a long-term solution.
Expected behavior I didn't find a clear reproduction. But I find it often will appear in my after a run.