OverflowError: cannot serialize a bytes object larger than 4 GiB[Error]

Describe the bug Hi all, Hi, I kept having this error: 'OverflowError: cannot serialize a bytes object larger than 4 GiB' when I run arboreto_with_multiprocessing.py \ sample.loom \ $tfs \ --method grnboost2 \ --output adj.sample.tsv \ --num_workers 40 \ --seed 777

Do you have any suggestion how to fixed this error?

Mote that most errors are due to the input from the user, and therefore should be treated as questions in the Discussions. Please, only report them as bugs if you are quite certain that they are not behaving as expected.

Steps to reproduce the behavior

Command run when the error occurred:
conda activate pyscenic cd /home/data/t220416/Melanoma/3_pyscenic/result_AM cat >change.py import os,sys os.getcwd() os.listdir(os.getcwd()) import loompy as lp; import numpy as np; import scanpy as sc; x=sc.read_csv("for.scenic.data.csv"); row_attrs = {"Gene": np.array(x.var_names),}; col_attrs = {"CellID": np.array(x.obs_names)}; lp.create("sample.loom",x.X.transpose(),row_attrs,col_attrs);

python change.py

cat >scenic.bash

dir=/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38

tfs=$dir/hs_hgnc_tfs.txt feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl

input_loom=./sample.loom ls $tfs $feather $tbl

arboreto_with_multiprocessing.py \ sample.loom \ $tfs \ --method grnboost2 \ --output adj.sample.tsv \ --num_workers 40 \ --seed 777

pyscenic ctx \ adj.sample.tsv $feather \ --annotations_fname $tbl \ --expression_mtx_fname $input_loom \ --mode "dask_multiprocessing" \ --output reg.csv \ --num_workers 20 \ --mask_dropouts

pyscenic aucell \ $input_loom \ reg.csv \ --output out_SCENIC.loom \ --num_workers 16

nohup bash scenic.bash 1>pySCENIC.log 2>&1 &

Error encountered:

nohup: ignoring input
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/hs_hgnc_tfs.txt
/home/data/t220416/Melanoma/3_pyscenic/0_data/index_genome/cisTarget_databases/hg38/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
Loaded expression matrix of 54196 cells and 41361 genes in 48.110148906707764 seconds...
Loaded 1839 TFs...
starting grnboost2 using 40 processes...

0%|          | 0/41361 [00:00<?, ?it/s]
0%|          | 0/41361 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 198, in <module>
main()
File "/home/data/t220416/miniconda3/envs/pyscenic/bin/arboreto_with_multiprocessing.py", line 184, in main
total=len(gene_names),
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/data/t220416/miniconda3/envs/pyscenic/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
usage: pyscenic ctx [-h] [-o OUTPUT] [-n] [--chunk_size CHUNK_SIZE]
                [--mode {custom_multiprocessing,dask_multiprocessing,dask_cluster}]
                [-a] [-t] [--rank_threshold RANK_THRESHOLD]
                [--auc_threshold AUC_THRESHOLD]
                [--nes_threshold NES_THRESHOLD]
                [--min_orthologous_identity MIN_ORTHOLOGOUS_IDENTITY]
                [--max_similarity_fdr MAX_SIMILARITY_FDR]
                --annotations_fname ANNOTATIONS_FNAME
                [--num_workers NUM_WORKERS]
                [--client_or_address CLIENT_OR_ADDRESS]
                [--thresholds THRESHOLDS [THRESHOLDS ...]]
                [--top_n_targets TOP_N_TARGETS [TOP_N_TARGETS ...]]
                [--top_n_regulators TOP_N_REGULATORS [TOP_N_REGULATORS ...]]
                [--min_genes MIN_GENES]
                [--expression_mtx_fname EXPRESSION_MTX_FNAME]
                [--mask_dropouts] [--cell_id_attribute CELL_ID_ATTRIBUTE]
                [--gene_attribute GENE_ATTRIBUTE] [--sparse]
                module_fname database_fname [database_fname ...]
pyscenic ctx: error: argument module_fname: can't open 'adj.sample.tsv': [Errno 2] No such file or directory: 'adj.sample.tsv'
usage: pyscenic aucell [-h] [-o OUTPUT] [-t] [-w] [--num_workers NUM_WORKERS]
                   [--seed SEED] [--rank_threshold RANK_THRESHOLD]
                   [--auc_threshold AUC_THRESHOLD]
                   [--nes_threshold NES_THRESHOLD]
                   [--cell_id_attribute CELL_ID_ATTRIBUTE]
                   [--gene_attribute GENE_ATTRIBUTE] [--sparse]
                   expression_mtx_fname signatures_fname
pyscenic aucell: error: argument signatures_fname: can't open 'reg.csv': [Errno 2] No such file or directory: 'reg.csv'

Expected behavior A clear and concise description of what you expected to happen.

Please complete the following information:

pySCENIC version: [0.12.1]
Installation method: [pip]
Run environment: [CLI]
OS: [Ubuntu]
Package versions: [obtain using pip freeze, conda list, or skip this if using Docker/Singularity]:
```
[environment.txt](https://github.com/aertslab/pySCENIC/files/12647829/environment.txt)
```

aertslab / pySCENIC

OverflowError: cannot serialize a bytes object larger than 4 GiB[Error] #502