aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
417 stars 178 forks source link

[BUG]error: 'OverflowError: cannot serialize a bytes object larger than 4 GiB'[ERROR] #503

Open liliyuan001 opened 12 months ago

liliyuan001 commented 12 months ago

Describe the bug Hi all, Hi, I kept having this error: 'OverflowError: cannot serialize a bytes object larger than 4 GiB' when I run arboreto_with_multiprocessing.py sample.loom $tfs --method grnboost2 --output adj.sample.tsv --num_workers 40 --seed 777

Steps to reproduce the behavior

  1. Command run when the error occurred:
    
    cd /home/data/t220416/Melanoma/3_pyscenic/result_AM
    cat >change.py
    import os,sys
    os.getcwd()
    os.listdir(os.getcwd())
    import loompy as lp;
    import numpy as np;
    import scanpy as sc;
    x=sc.read_csv("for.scenic.data.csv");
    row_attrs = {"Gene": np.array(x.var_names),};
    col_attrs = {"CellID": np.array(x.obs_names)};
    lp.create("sample.loom",x.X.transpose(),row_attrs,col_attrs);

python change.py

cat >scenic.bash

dir=/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38

tfs=$dir/hs_hgnc_tfs.txt feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl

input_loom=./sample.loom ls $tfs $feather $tbl

arboreto_with_multiprocessing.py sample.loom $tfs --method grnboost2 --output adj.sample.tsv --num_workers 40 --seed 777

pyscenic ctx adj.sample.tsv $feather --annotations_fname $tbl --expression_mtx_fname $input_loom --mode "dask_multiprocessing" --output reg.csv --num_workers 20 --mask_dropouts

pyscenic aucell $input_loom reg.csv --output out_SCENIC.loom --num_workers 16

nohup bash scenic.bash 1>pySCENIC.log 2>&1 &


2. Error encountered:
<!-- Please specify the **complete** error message (if applicable, otherwise delete this block): -->

nohup: ignoring input /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hs_hgnc_tfs.txt /home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl Loaded expression matrix of 54196 cells and 41361 genes in 48.110148906707764 seconds... Loaded 1839 TFs... starting grnboost2 using 40 processes...

0%| | 0/41361 [00:00<?, ?it/s] 0%| | 0/41361 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 198, in main() File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 184, in main total=len(gene_names), File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) OverflowError: cannot serialize a bytes object larger than 4 GiB usage: pyscenic ctx [-h] [-o OUTPUT] [-n] [--chunk_size CHUNK_SIZE] [--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}] [-a] [-t] [--rank_threshold RANK_THRESHOLD] [--auc_threshold AUC_THRESHOLD] [--nes_threshold NES_THRESHOLD] [--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY] [--max_similarity_fdr MAX_SIMILARITY_FDR] --annotations_fname ANNOTATIONS_FNAME [--num_workers NUM_WORKERS] [--client_or_address CLIENT_OR_ADDRESS] [--thresholds THRESHOLDS [THRESHOLDS ...]] [--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]] [--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]] [--min_genes MIN_GENES] [--expression_mtx_fname EXPRESSION_MTX_FNAME] [--mask_dropouts] [--cell_id_attribute CELL_ID_ATTRIBUTE] [--gene_attribute GENE_ATTRIBUTE] [--sparse] module_fname database_fname [database_fname ...] pyscenic ctx: error: argument module_fname: can't open 'adj.sample.tsv': [Errno 2] No such file or directory: 'adj.sample.tsv' usage: pyscenic aucell [-h] [-o OUTPUT] [-t] [-w] [--num_workers NUM_WORKERS] [--seed SEED] [--rank_threshold RANK_THRESHOLD] [--auc_threshold AUC_THRESHOLD] [--nes_threshold NES_THRESHOLD] [--cell_id_attribute CELL_ID_ATTRIBUTE] [--gene_attribute GENE_ATTRIBUTE] [--sparse] expression_mtx_fname signatures_fname pyscenic aucell: error: argument signatures_fname: can't open 'reg.csv': [Errno 2] No such file or directory: 'reg.csv'


**Please complete the following information:**
- pySCENIC version: [0.12.1]
- Installation method: [Pip]
- Run environment: [CLI]
- OS: [Ubuntu]
- Package versions: [obtain using `pip freeze`, `conda list`, or skip this if using Docker/Singularity]:
<!-- Put your package version list in this code block (if applicable, else delete the block): -->

environment.txt

YeyeUnknow commented 6 months ago

hi, have u fixed the the error, I occured the same situation with u.