pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
Describe the bug
Hi all,
Hi, I kept having this error: 'OverflowError: cannot serialize a bytes object larger than 4 GiB' when I run arboreto_with_multiprocessing.py
sample.loom
$tfs
--method grnboost2
--output adj.sample.tsv
--num_workers 40
--seed 777
Steps to reproduce the behavior
Command run when the error occurred:
cd /home/data/t220416/Melanoma/3_pyscenic/result_AM
cat >change.py
import os,sys
os.getcwd()
os.listdir(os.getcwd())
import loompy as lp;
import numpy as np;
import scanpy as sc;
x=sc.read_csv("for.scenic.data.csv");
row_attrs = {"Gene": np.array(x.var_names),};
col_attrs = {"CellID": np.array(x.obs_names)};
lp.create("sample.loom",x.X.transpose(),row_attrs,col_attrs);
2. Error encountered:
<!-- Please specify the **complete** error message (if applicable, otherwise delete this block): -->
nohup: ignoring input
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hs_hgnc_tfs.txt
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
Loaded expression matrix of 54196 cells and 41361 genes in 48.110148906707764 seconds...
Loaded 1839 TFs...
starting grnboost2 using 40 processes...
0%| | 0/41361 [00:00<?, ?it/s]
0%| | 0/41361 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 198, in
main()
File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 184, in main
total=len(gene_names),
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
usage: pyscenic ctx [-h] [-o OUTPUT] [-n] [--chunk_size CHUNK_SIZE]
[--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}]
[-a] [-t] [--rank_threshold RANK_THRESHOLD]
[--auc_threshold AUC_THRESHOLD]
[--nes_threshold NES_THRESHOLD]
[--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY]
[--max_similarity_fdr MAX_SIMILARITY_FDR]
--annotations_fname ANNOTATIONS_FNAME
[--num_workers NUM_WORKERS]
[--client_or_address CLIENT_OR_ADDRESS]
[--thresholds THRESHOLDS [THRESHOLDS ...]]
[--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]]
[--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]]
[--min_genes MIN_GENES]
[--expression_mtx_fname EXPRESSION_MTX_FNAME]
[--mask_dropouts] [--cell_id_attribute CELL_ID_ATTRIBUTE]
[--gene_attribute GENE_ATTRIBUTE] [--sparse]
module_fname database_fname [database_fname ...]
pyscenic ctx: error: argument module_fname: can't open 'adj.sample.tsv': [Errno 2] No such file or directory: 'adj.sample.tsv'
usage: pyscenic aucell [-h] [-o OUTPUT] [-t] [-w] [--num_workers NUM_WORKERS]
[--seed SEED] [--rank_threshold RANK_THRESHOLD]
[--auc_threshold AUC_THRESHOLD]
[--nes_threshold NES_THRESHOLD]
[--cell_id_attribute CELL_ID_ATTRIBUTE]
[--gene_attribute GENE_ATTRIBUTE] [--sparse]
expression_mtx_fname signatures_fname
pyscenic aucell: error: argument signatures_fname: can't open 'reg.csv': [Errno 2] No such file or directory: 'reg.csv'
**Please complete the following information:**
- pySCENIC version: [0.12.1]
- Installation method: [Pip]
- Run environment: [CLI]
- OS: [Ubuntu]
- Package versions: [obtain using `pip freeze`, `conda list`, or skip this if using Docker/Singularity]:
<!-- Put your package version list in this code block (if applicable, else delete the block): -->
Describe the bug Hi all, Hi, I kept having this error: 'OverflowError: cannot serialize a bytes object larger than 4 GiB' when I run arboreto_with_multiprocessing.py sample.loom $tfs --method grnboost2 --output adj.sample.tsv --num_workers 40 --seed 777
Steps to reproduce the behavior
python change.py
cat >scenic.bash
dir=/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38
tfs=$dir/hs_hgnc_tfs.txt feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
input_loom=./sample.loom ls $tfs $feather $tbl
arboreto_with_multiprocessing.py sample.loom $tfs --method grnboost2 --output adj.sample.tsv --num_workers 40 --seed 777
pyscenic ctx adj.sample.tsv $feather --annotations_fname $tbl --expression_mtx_fname $input_loom --mode "dask_multiprocessing" --output reg.csv --num_workers 20 --mask_dropouts
pyscenic aucell $input_loom reg.csv --output out_SCENIC.loom --num_workers 16
nohup bash scenic.bash 1>pySCENIC.log 2>&1 &
nohup: ignoring input /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hs_hgnc_tfs.txt /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl Loaded expression matrix of 54196 cells and 41361 genes in 48.110148906707764 seconds... Loaded 1839 TFs... starting grnboost2 using 40 processes...
0%| | 0/41361 [00:00<?, ?it/s] 0%| | 0/41361 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 198, in
main()
File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 184, in main
total=len(gene_names),
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
usage: pyscenic ctx [-h] [-o OUTPUT] [-n] [--chunk_size CHUNK_SIZE]
[--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}]
[-a] [-t] [--rank_threshold RANK_THRESHOLD]
[--auc_threshold AUC_THRESHOLD]
[--nes_threshold NES_THRESHOLD]
[--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY]
[--max_similarity_fdr MAX_SIMILARITY_FDR]
--annotations_fname ANNOTATIONS_FNAME
[--num_workers NUM_WORKERS]
[--client_or_address CLIENT_OR_ADDRESS]
[--thresholds THRESHOLDS [THRESHOLDS ...]]
[--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]]
[--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]]
[--min_genes MIN_GENES]
[--expression_mtx_fname EXPRESSION_MTX_FNAME]
[--mask_dropouts] [--cell_id_attribute CELL_ID_ATTRIBUTE]
[--gene_attribute GENE_ATTRIBUTE] [--sparse]
module_fname database_fname [database_fname ...]
pyscenic ctx: error: argument module_fname: can't open 'adj.sample.tsv': [Errno 2] No such file or directory: 'adj.sample.tsv'
usage: pyscenic aucell [-h] [-o OUTPUT] [-t] [-w] [--num_workers NUM_WORKERS]
[--seed SEED] [--rank_threshold RANK_THRESHOLD]
[--auc_threshold AUC_THRESHOLD]
[--nes_threshold NES_THRESHOLD]
[--cell_id_attribute CELL_ID_ATTRIBUTE]
[--gene_attribute GENE_ATTRIBUTE] [--sparse]
expression_mtx_fname signatures_fname
pyscenic aucell: error: argument signatures_fname: can't open 'reg.csv': [Errno 2] No such file or directory: 'reg.csv'
environment.txt