Error when running SnakeMake

gilgolan73 commented 2 months ago

Describe the bug Hello, I'm trying to run SCENIC+ using SnakeMake in a linux machine (centos 9), on the tutorial dataset. I ran scATAC-seq preprocessing in python (using pycistopic, using the tutorial: https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html) Then I ran the scRNAseq preprocessing in python (using the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum_scRNA_pp.html#Preprocessing-the-scRNA-seq-data). I'm using the default config.yml file for SnakeMake, just changed the location of the input data. 1-2 minutes after running SnakeMake (as in the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html#Running-SCENIC+), I receive an error (see below) which I believe is regarding to the different cell names between the scATACseq and scRNAseq datasets. Please let me know how to solve this issue. Thank you!

To Reproduce "snakemake --cores 20"

Error output "Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count

AUCell_direct 1 AUCell_extended 1 all 1 download_genome_annotations 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_cistarget 1 motif_enrichment_dem 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 14

Select jobs to execute... Execute 2 jobs...

[Tue Sep 17 16:12:50 2024] localrule prepare_GEX_ACC_multiome: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu jobid: 2 reason: Missing output files: ACC_GEX.h5mu resources: tmpdir=/tmp

[Tue Sep 17 16:12:50 2024] localrule download_genome_annotations: output: genome_annotation.tsv, chromsizes.tsv jobid: 8 reason: Missing output files: chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp

2024-09-17 16:12:52,879 SCENIC+ INFO Reading cisTopic object. 2024-09-17 16:12:53,289 SCENIC+ INFO Reading gene expression AnnData. Traceback (most recent call last): File "/home/gilgolan/.local/bin/scenicplus", line 8, in sys.exit(main()) ^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main args.func(args) File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 46, in command_prepare_GEX_ACC prepare_GEX_ACC( File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 96, in prepare_GEX_ACC mdata = process_multiome_data( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 73, in process_multiome_data raise Exception( Exception: No cells found which are present in both assays, check input and consider using bc_transform_func! [Tue Sep 17 16:12:53 2024] Error in rule prepare_GEX_ACC_multiome: jobid: 2 input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu shell:

        scenicplus prepare_data prepare_GEX_ACC                 --cisTopic_obj_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl                 --GEX_anndata_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad                 --out_file ACC_GEX.h5mu                 --bc_transform_func "lambda x: f'{x}'"

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

2024-09-17 16:13:38,876 Download gene annotation INFO Using genome: GRCh38.p14 2024-09-17 16:13:38,878 Download gene annotation INFO Found corresponding genome Id 51 on NCBI 2024-09-17 16:13:39,381 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI 2024-09-17 16:13:39,884 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt

Unhandeled exception occured <urlopen error [Errno 110] Connection timed out> Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned 2024-09-17 16:15:50,496 SCENIC+ INFO Chrosomome sizes was not found, please provide this information manually. 2024-09-17 16:15:50,497 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv Waiting at most 5 seconds for missing files. MissingOutputException in rule download_genome_annotations in file /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scplus_pipeline/Snakemake/workflow/Snakefile, line 221: Job 8 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait: chromsizes.tsv Removing output files of failed job download_genome_annotations since they might be corrupted: genome_annotation.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-09-17T161250.453181.snakemake.log WorkflowError: At least one job did not complete successfully."

Expected behavior I expect SnakeMake to run successfully on the tutorial pre-processed dataset.

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information): Python: 3.11.9 SCENIC+: 1.0a1 pyscenic 0.12.1+8.gd2309fe

Additional context When I look at the cell names in the adata object (scRNAseq) and cistopic object (scATACseq) they are different, also I have a different number of cells: "adata.obs Out[216]: VSN_cell_type ... pct_counts_mt CCCTCATAGACACTTA-1 GC ... 0.083148 GCCATTACACCTGCCT-1 ASTP ... 0.132626 ATTGCAGGTTGTGACA-1 MGL ... 1.447368 CTGTTGGAGGCATTAC-1 MOL_B ... 0.095579 CGAATCTAGCTTAGCG-1 MOL_B ... 0.061851 ... ... ... ... TACCTTAGTTACTAGG-1 MOL_B ... 0.058480 GTAGCTGTCATTACAG-1 AST_CER ... 0.188088 AGGCAGGTCGCGACAC-1 MOL_A ... 0.044703 GCATTGCCAAGACTCC-1 MOL_B ... 0.145530 ACATTAGTCCGCAAGC-1 AST_CER ... 0.068552 [2313 rows x 13 columns]

cistopic_obj.cell_data Out[218]: cisTopic_nr_frag ... pycisTopic_cca_Seurat_cell_type CACCTCAGTTGTAAAC-1-10x_multiome_brain 18300 ... AST TGACTCCTCATCCACC-1-10x_multiome_brain 100055 ... BG TTTCTCACATAAACCT-1-10x_multiome_brain 32192 ... GP GTCCTCCCACACAATT-1-10x_multiome_brain 88443 ... BG CTCCGTCCAGTTTGTG-1-10x_multiome_brain 131110 ... ENDO ... ... ... ... GCAGGTTGTCCAAATG-1-10x_multiome_brain 2770 ... MOL AAGCTCCCAGCACCAT-1-10x_multiome_brain 2180 ... MOL CAGAATCTCCTCATGC-1-10x_multiome_brain 1744 ... MOL TAGCCGGGTAACAGGG-1-10x_multiome_brain 2674 ... INH_SNCG GTGCGCAGTGCTTAGA-1-10x_multiome_brain 5859 ... GP [2845 rows x 42 columns]"

ruicatxiao commented 1 month ago

Running into this exact error as well. Curious to see if there is any potential fix

kennethho04 commented 1 month ago

I got the same error in rule prepare_GEX_ACC_multiome and I was able to resolve it by setting bc_transform_func under params_data_preparation in config.yaml to "\"lambda x: f'{x}-10x_multiome_brain'\"". You can see that in the tutorial page when they show the config.yaml pipeline.

Not sure about the error with download_genome_annotations though. I ran into a similar problem with download_genome_annotations myself going through the tutorial and still unable to resolve it just yet.

SeppeDeWinter commented 1 month ago

Hi @ruicatxiao and @gilgolan73

Related to the barcode error, indeed make use of the bc_transform_func as mentioned by @kennethho04 (let me know if you need help with this).

Related to the chromsizes issue, looks like the pipeline was not able to download these files automatically. You can download them from: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes

All the best,

Seppe

gilgolan73 commented 1 month ago

Hello @SeppeDeWinter , indeed this solution (to change bc_transform_func) solve the issue. For the chromesizes issue, changing the data_wrangling/gene_search_space.py as mentioned in https://github.com/aertslab/scenicplus/issues/357 solve this issue.

However, now I encounter another issue when running Snakemake . The issue is related to the DARs, I checked and all the DAR region set bed files are not empty (as suggested in https://github.com/aertslab/scenicplus/issues/183). Thank you for the help. Gil

attached is the error: "(scenicplus) [gilgolan@localhost Snakemake]$ snakemake --cores 20 Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count

AUCell_direct 1 AUCell_extended 1 all 1 download_genome_annotations 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_cistarget 1 motif_enrichment_dem 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 14

Select jobs to execute... Execute 2 jobs...

[Tue Oct 8 15:58:49 2024] localrule prepare_GEX_ACC_multiome: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu jobid: 2 reason: Missing output files: ACC_GEX.h5mu resources: tmpdir=/tmp

[Tue Oct 8 15:58:49 2024] localrule download_genome_annotations: output: genome_annotation.tsv, chromsizes.tsv jobid: 8 reason: Missing output files: chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp

2024-10-08 15:58:51,599 SCENIC+ 2024-10-08 15:58:51,975 SCENIC+ 2024-10-08 15:58:52,056 2024-10-08 15:58:52,196 cisTopic 2024-10-08 15:58:52,196 cisTopic 2024-10-08 15:58:52,402 cisTopic 2024-10-08 15:58:52,603 cisTopic 2024-10-08 15:58:52,804 cisTopic 2024-10-08 15:58:52,994 cisTopic 2024-10-08 15:58:53,185 cisTopic 2024-10-08 15:58:53,384 cisTopic 2024-10-08 15:58:53,577 cisTopic 2024-10-08 15:58:53,777 cisTopic 2024-10-08 15:58:53,966 cisTopic 2024-10-08 15:58:54,160 cisTopic 2024-10-08 15:58:54,367 cisTopic 2024-10-08 15:58:54,587 cisTopic 2024-10-08 15:58:54,793 cisTopic 2024-10-08 15:58:55,005 cisTopic 2024-10-08 15:58:55,196 cisTopic 2024-10-08 15:58:55,385 cisTopic 2024-10-08 15:58:55,593 cisTopic 2024-10-08 15:58:55,808 cisTopic 2024-10-08 15:58:56,007 cisTopic 2024-10-08 15:58:56,196 cisTopic 2024-10-08 15:58:56,398 cisTopic 2024-10-08 15:58:56,561 cisTopic ... storing 'sample_id' as categorical ... storing 'VSN_cell_type' ... storing 'VSN_leiden_res0.3' ... storing 'VSN_leiden_res0.6' ... storing 'VSN_leiden_res0.9' ... storing 'VSN_leiden_res1.2' ... storing 'VSN_sample_id' ... storing 'Seurat_leiden_res0.6' ... storing 'Seurat_leiden_res1.2' ... storing 'Seurat_cell_type' ... storing 'Chromosome' as categorical [Tue Oct 8 15:59:03 2024] Finished job 2. 1 of 14 steps (7%) done Select jobs to execute... 2024-10-08 15:59:18,979 2024-10-08 15:59:18,982 2024-10-08 15:59:19,487 2024-10-08 15:59:19,991 2024-10-08 15:59:21,279 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT 2024-10-08 15:59:21,289 Original UCSC 1 chr1 2 chr2 3 chr3 4 chr4 5 chr5 6 chr6 7 chr7 8 chr8 9 chr9 10 chr10 11 chr11 12 chr12 13 chr13 14 chr14 15 chr15 16 chr16 17 chr17 18 chr18 19 chr19 20 chr20 21 chr21 22 chr22 X chrX Y chrY MT chrM 2024-10-08 15:59:21,297 SCENIC+ 2024-10-08 15:59:21,298 SCENIC+ [Tue Oct 8 15:59:21 2024] Finished job 8. 2 of 14 steps (14%) done Execute 1 jobs... INFO Reading cisTopic object. INFO Reading gene expression AnnData. Ingesting multiome data INFO Found 1963 multiome cells. INFO Imputing region accessibility INFO Impute region accessibility for regions 0-20000 INFO Impute region accessibility for regions 20000-40000 INFO Impute region accessibility for regions 40000-60000 INFO Impute region accessibility for regions 60000-80000 INFO Impute region accessibility for regions 80000-100000 INFO Impute region accessibility for regions 100000-120000 INFO Impute region accessibility for regions 120000-140000 INFO Impute region accessibility for regions 140000-160000 INFO Impute region accessibility for regions 160000-180000 INFO Impute region accessibility for regions 180000-200000 INFO Impute region accessibility for regions 200000-220000 INFO Impute region accessibility for regions 220000-240000 INFO Impute region accessibility for regions 240000-260000 INFO Impute region accessibility for regions 260000-280000 INFO Impute region accessibility for regions 280000-300000 INFO Impute region accessibility for regions 300000-320000 INFO Impute region accessibility for regions 320000-340000 INFO Impute region accessibility for regions 340000-360000 INFO Impute region accessibility for regions 360000-380000 INFO Impute region accessibility for regions 380000-400000 INFO Impute region accessibility for regions 400000-420000 INFO Impute region accessibility for regions 420000-440000 INFO Done! as categorical as categorical as categorical as categorical as categorical as categorical as categorical as categorical as categorical Download gene annotation INFO Using genome: GRCh38.p14 Download gene annotation INFO Found corresponding genome Id 51 on NCBI Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI Download gene annotation INFO Downloading assembly information from: http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt Download gene annotation INFO Found following assembled molecules (chromosomes): Download gene annotation INFO Converting chromosomes names to UCSC style as follows: INFO Saving chromosome sizes to: chromsizes.tsv INFO Saving genome annotation to: genome_annotation.tsv

[Tue Oct 8 15:59:21 2024] localrule get_search_space: input: ACC_GEX.h5mu, genome_annotation.tsv, chromsizes.tsv output: search_space.tsv jobid: 11 reason: Missing output files: search_space.tsv; Input files updated by another job: ACC_GEX.h5mu, chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp

2024-10-08 15:59:23,995 SCENIC+ INFO Reading data /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( 2024-10-08 15:59:26,016 Get search space INFO Extending promoter annotation to 10 bp upstream and 10 downstream 2024-10-08 15:59:26,116 Get search space INFO Extending search space to: 150000 bp downstream of the end of the gene. 150000 bp upstream of the start of the gene. 2024-10-08 15:59:26,516 Get search space INFO Intersecting with regions. 2024-10-08 15:59:27,792 Get search space INFO Calculating distances from region to gene 2024-10-08 16:00:09,179 Get search space INFO Imploding multiple entries per region and gene 2024-10-08 16:01:45,857 SCENIC+ INFO Writing search space to: search_space.tsv [Tue Oct 8 16:01:47 2024] Finished job 11. 3 of 14 steps (21%) done Select jobs to execute... Execute 1 jobs...

[Tue Oct 8 16:01:47 2024] localrule region_to_gene: input: ACC_GEX.h5mu, search_space.tsv output: region_to_gene_adj.tsv jobid: 10 reason: Missing output files: region_to_gene_adj.tsv; Input files updated by another job: ACC_GEX.h5mu, search_space.tsv threads: 20 resources: tmpdir=/tmp

2024-10-08 16:01:51,690 SCENIC+ INFO Reading multiome MuData. /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( 2024-10-08 16:01:53,068 SCENIC+ INFO Reading search space 2024-10-08 16:01:53,646 R2G INFO Calculating region to gene importances, using GBM method Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [19:06<00:00, 16.19it/s] 2024-10-08 16:21:06,511 R2G INFO Calculating region to gene correlation, using SR method Running using 20 cores: 0%| | 0/18565 [00:00<?, ?it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 280/18565 [00:00<01:05, 278.20it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 313/18565 [00:00<01:03, 285.64it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 360/18565 [00:01<01:19, 229.80it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג–‰ | 386/18565 [00:01<01:19, 228.68it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 3%|ג–ˆג–ˆג– | 508/18565 [00:02<01:59, 150.90it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג– | 715/18565 [00:04<03:07, 95.32it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–‰ | 783/18565 [00:05<02:20, 126.41it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 838/18565 [00:05<01:43, 170.62it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 878/18565 [00:06<03:39, 80.51it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 900/18565 [00:06<02:59, 98.28it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [02:44<00:00, 113.04it/s] 2024-10-08 16:24:01,456 R2G INFO Done! 2024-10-08 16:24:01,568 SCENIC+ INFO Saving region to gene adjacencies to region_to_gene_adj.tsv [Tue Oct 8 16:24:07 2024] Finished job 10. 4 of 14 steps (29%) done Select jobs to execute... Execute 1 jobs...

[Tue Oct 8 16:24:07 2024] localrule motif_enrichment_dem: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl output: dem_results.hdf5, dem_results.html jobid: 7 reason: Missing output files: dem_results.hdf5; Input files updated by another job: genome_annotation.tsv threads: 20 resources: tmpdir=/tmp

2024-10-08 16:24:11,789 SCENIC+ INFO Reading region sets from: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets 2024-10-08 16:24:11,789 SCENIC+ INFO Reading all .bed files in: Topics_otsu 2024-10-08 16:24:12,109 SCENIC+ INFO Reading all .bed files in: Topics_top_3k 2024-10-08 16:24:12,194 SCENIC+ INFO Reading all .bed files in: DARs_cell_type joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker r = call_item() ^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in call return self.fn(*self.args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in call return [func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in return [func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 320, in _run_dem_single_region_set dem_db = DEMDatabase( ^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/motif_enrichment_dem.py", line 147, in init self.db_regions = pr.PyRanges(region_names_to_coordinates(list(self.genes))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/utils.py", line 35, in region_names_to_coordinates regiondf.columns=['Chromosome', 'Start', 'End'] ^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in setattr return object.setattr(self, name, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis self._mgr.set_axis(axis, labels) File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis self._validate_set_axis(axis, new_labels) File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/gilgolan/.local/bin/scenicplus", line 8, in sys.exit(main()) ^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main args.func(args) File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 588, in motif_enrichment_dem run_motif_enrichment_dem( File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 466, in run_motif_enrichment_dem dem_results: List[DEM] = joblib.Parallel( ^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in call return output if self.return_generator else list(output) ^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs yield from self._retrieve() File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve self._raise_error_fast() File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast error_job.get_result(self.timeout) File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result return self._return_or_raise() ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise raise self._result ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements [Tue Oct 8 16:24:31 2024] Error in rule motif_enrichment_dem: jobid: 7 input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl output: dem_results.hdf5, dem_results.html shell:

            scenicplus grn_inference motif_enrichment_dem                     --region_set_folder /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets                     --dem_db_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather                     --output_fname_dem_result dem_results.hdf5                     --temp_dir /tmp                     --species homo_sapiens                     --fraction_overlap_w_dem_database 0.4                     --max_bg_regions 500                     --genome_annotation genome_annotation.tsv                     --balance_number_of_promoters                     --promoter_space 1000                     --adjpval_thr 0.05                     --log2fc_thr 1.0                     --mean_fg_thr 0.0                     --motif_hit_thr 3.0                     --path_to_motif_annotations /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl                     --annotation_version v10nr_clust                     --motif_similarity_fdr 0.001                     --orthologous_identity_threshold 0.0                     --annotations_to_use Direct_annot Orthology_annot                     --write_html                     --output_fname_dem_html dem_results.html                     --seed 666                     --n_cpu 20

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-10-08T155849.153427.snakemake.log WorkflowError: At least one job did not complete successfully."

kennethho04 commented 1 month ago

Hi @gilgolan73

Seems like you ran into same problem as issue #432 ; rolling your python version from 3.11.9 to 3.11.8 should resolve the error.

gilgolan73 commented 1 month ago

Hello @kennethho04 , thank you for the quick reply How do you suggest to revert the python version? Do i need to re-install the packages?

Gil

kennethho04 commented 1 month ago

@gilgolan73 doing conda install python==3.11.8 in your conda env should suffice. You don't need to re-install the packages.

gilgolan73 commented 1 month ago

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt

yojetsharma commented 1 month ago

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt

It looks like an issue with parallel processing. Have you tried installing/downgrading dask?

gilgolan73 commented 1 month ago

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you, Gil

yojetsharma commented 1 month ago

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you, Gil

I too had faced an issue with parallel processing but as someone suggested in one of the other issues, downgrading it to 2024.5.0 helped. But looks like you are already using that. The python version that I am using is 3.11.10.

gilgolan73 commented 1 month ago

@yojetsharma @kennethho04 I tried with python 3.11.8, both with dask version 2024.2.1 and 2024.5.0. still receiving the same error. do you think I need to install python 3.11.10?

Thanks

yojetsharma commented 1 month ago

Does reducing number of cores help? Also, are the region_sets, genome_annotation.tsv are looking fine?

gilgolan73 commented 1 month ago

@yojetsharma Hi, I tried to reduce the number of cores to 10, and to 1. It still doesn't help. The files look OK, i'm attaching them. genome_annotation.txt PURK.txt OPC.txt NFOL.txt MOL.txt MGL.txt MG.txt INH_VIP.txt INH_SST.txt INH_SNCG.txt INH_PVALB.txt GP.txt GC.txt ENDO.txt COP.txt BG.txt AST.txt

Gil

yojetsharma commented 1 month ago

My last resort would be to try and reinstalling the conda env and see if it fixes the issue.

gilgolan73 commented 3 weeks ago

Hi @yojetsharma @kennethho04 I tried to reinstall the conda env (both with python 3.11.10 and python 3.11.8), it still did not resolve the issue. Do you have any other suggestions?

Thank you, Gil

brianysoong commented 6 days ago

@gilgolan73 Did you end up finding a way to resolve your issue? I think I am experiencing a similar issue, where the cistarget motif enrichment is working fine, but DEM does not identify any motifs

gilgolan73 commented 3 days ago

Hi @brianysoong, unfortunately I am still stuck with this issue. Tried many different Dask and python versions but the same issue persists. Do you have any suggestions? @yojetsharma @kennethho04

Gil

brianysoong commented 14 hours ago

@gilgolan73 In my case, it ended up being a dumb mistake where I used human genome / annotations for mouse data!

aertslab / scenicplus

Error when running SnakeMake #468