Open gilgolan73 opened 2 months ago
Running into this exact error as well. Curious to see if there is any potential fix
I got the same error in rule prepare_GEX_ACC_multiome and I was able to resolve it by setting bc_transform_func
under params_data_preparation in config.yaml to "\"lambda x: f'{x}-10x_multiome_brain'\""
. You can see that in the tutorial page when they show the config.yaml pipeline.
Not sure about the error with download_genome_annotations
though. I ran into a similar problem with download_genome_annotations
myself going through the tutorial and still unable to resolve it just yet.
Hi @ruicatxiao and @gilgolan73
Related to the barcode error, indeed make use of the bc_transform_func
as mentioned by @kennethho04 (let me know if you need help with this).
Related to the chromsizes issue, looks like the pipeline was not able to download these files automatically. You can download them from: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes
All the best,
Seppe
Hello @SeppeDeWinter , indeed this solution (to change bc_transform_func) solve the issue. For the chromesizes issue, changing the data_wrangling/gene_search_space.py as mentioned in https://github.com/aertslab/scenicplus/issues/357 solve this issue.
However, now I encounter another issue when running Snakemake . The issue is related to the DARs, I checked and all the DAR region set bed files are not empty (as suggested in https://github.com/aertslab/scenicplus/issues/183). Thank you for the help. Gil
attached is the error: "(scenicplus) [gilgolan@localhost Snakemake]$ snakemake --cores 20 Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count
AUCell_direct 1 AUCell_extended 1 all 1 download_genome_annotations 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_cistarget 1 motif_enrichment_dem 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 14
Select jobs to execute... Execute 2 jobs...
[Tue Oct 8 15:58:49 2024] localrule prepare_GEX_ACC_multiome: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu jobid: 2 reason: Missing output files: ACC_GEX.h5mu resources: tmpdir=/tmp
[Tue Oct 8 15:58:49 2024] localrule download_genome_annotations: output: genome_annotation.tsv, chromsizes.tsv jobid: 8 reason: Missing output files: chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp
2024-10-08 15:58:51,599 SCENIC+ INFO Reading cisTopic object. 2024-10-08 15:58:51,975 SCENIC+ INFO Reading gene expression AnnData. 2024-10-08 15:58:52,056 Ingesting multiome data INFO Found 1963 multiome cells. 2024-10-08 15:58:52,196 cisTopic INFO Imputing region accessibility 2024-10-08 15:58:52,196 cisTopic INFO Impute region accessibility for regions 0-20000 2024-10-08 15:58:52,402 cisTopic INFO Impute region accessibility for regions 20000-40000 2024-10-08 15:58:52,603 cisTopic INFO Impute region accessibility for regions 40000-60000 2024-10-08 15:58:52,804 cisTopic INFO Impute region accessibility for regions 60000-80000 2024-10-08 15:58:52,994 cisTopic INFO Impute region accessibility for regions 80000-100000 2024-10-08 15:58:53,185 cisTopic INFO Impute region accessibility for regions 100000-120000 2024-10-08 15:58:53,384 cisTopic INFO Impute region accessibility for regions 120000-140000 2024-10-08 15:58:53,577 cisTopic INFO Impute region accessibility for regions 140000-160000 2024-10-08 15:58:53,777 cisTopic INFO Impute region accessibility for regions 160000-180000 2024-10-08 15:58:53,966 cisTopic INFO Impute region accessibility for regions 180000-200000 2024-10-08 15:58:54,160 cisTopic INFO Impute region accessibility for regions 200000-220000 2024-10-08 15:58:54,367 cisTopic INFO Impute region accessibility for regions 220000-240000 2024-10-08 15:58:54,587 cisTopic INFO Impute region accessibility for regions 240000-260000 2024-10-08 15:58:54,793 cisTopic INFO Impute region accessibility for regions 260000-280000 2024-10-08 15:58:55,005 cisTopic INFO Impute region accessibility for regions 280000-300000 2024-10-08 15:58:55,196 cisTopic INFO Impute region accessibility for regions 300000-320000 2024-10-08 15:58:55,385 cisTopic INFO Impute region accessibility for regions 320000-340000 2024-10-08 15:58:55,593 cisTopic INFO Impute region accessibility for regions 340000-360000 2024-10-08 15:58:55,808 cisTopic INFO Impute region accessibility for regions 360000-380000 2024-10-08 15:58:56,007 cisTopic INFO Impute region accessibility for regions 380000-400000 2024-10-08 15:58:56,196 cisTopic INFO Impute region accessibility for regions 400000-420000 2024-10-08 15:58:56,398 cisTopic INFO Impute region accessibility for regions 420000-440000 2024-10-08 15:58:56,561 cisTopic INFO Done! ... storing 'sample_id' as categorical ... storing 'VSN_cell_type' as categorical ... storing 'VSN_leiden_res0.3' as categorical ... storing 'VSN_leiden_res0.6' as categorical ... storing 'VSN_leiden_res0.9' as categorical ... storing 'VSN_leiden_res1.2' as categorical ... storing 'VSN_sample_id' as categorical ... storing 'Seurat_leiden_res0.6' as categorical ... storing 'Seurat_leiden_res1.2' as categorical ... storing 'Seurat_cell_type' as categorical ... storing 'Chromosome' as categorical [Tue Oct 8 15:59:03 2024] Finished job 2. 1 of 14 steps (7%) done Select jobs to execute... 2024-10-08 15:59:18,979 Download gene annotation INFO Using genome: GRCh38.p14 2024-10-08 15:59:18,982 Download gene annotation INFO Found corresponding genome Id 51 on NCBI 2024-10-08 15:59:19,487 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI 2024-10-08 15:59:19,991 Download gene annotation INFO Downloading assembly information from: http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt 2024-10-08 15:59:21,279 Download gene annotation INFO Found following assembled molecules (chromosomes): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT 2024-10-08 15:59:21,289 Download gene annotation INFO Converting chromosomes names to UCSC style as follows: Original UCSC 1 chr1 2 chr2 3 chr3 4 chr4 5 chr5 6 chr6 7 chr7 8 chr8 9 chr9 10 chr10 11 chr11 12 chr12 13 chr13 14 chr14 15 chr15 16 chr16 17 chr17 18 chr18 19 chr19 20 chr20 21 chr21 22 chr22 X chrX Y chrY MT chrM 2024-10-08 15:59:21,297 SCENIC+ INFO Saving chromosome sizes to: chromsizes.tsv 2024-10-08 15:59:21,298 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv [Tue Oct 8 15:59:21 2024] Finished job 8. 2 of 14 steps (14%) done Execute 1 jobs...
[Tue Oct 8 15:59:21 2024] localrule get_search_space: input: ACC_GEX.h5mu, genome_annotation.tsv, chromsizes.tsv output: search_space.tsv jobid: 11 reason: Missing output files: search_space.tsv; Input files updated by another job: ACC_GEX.h5mu, chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp
2024-10-08 15:59:23,995 SCENIC+ INFO Reading data /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( 2024-10-08 15:59:26,016 Get search space INFO Extending promoter annotation to 10 bp upstream and 10 downstream 2024-10-08 15:59:26,116 Get search space INFO Extending search space to: 150000 bp downstream of the end of the gene. 150000 bp upstream of the start of the gene. 2024-10-08 15:59:26,516 Get search space INFO Intersecting with regions. 2024-10-08 15:59:27,792 Get search space INFO Calculating distances from region to gene 2024-10-08 16:00:09,179 Get search space INFO Imploding multiple entries per region and gene 2024-10-08 16:01:45,857 SCENIC+ INFO Writing search space to: search_space.tsv [Tue Oct 8 16:01:47 2024] Finished job 11. 3 of 14 steps (21%) done Select jobs to execute... Execute 1 jobs...
[Tue Oct 8 16:01:47 2024] localrule region_to_gene: input: ACC_GEX.h5mu, search_space.tsv output: region_to_gene_adj.tsv jobid: 10 reason: Missing output files: region_to_gene_adj.tsv; Input files updated by another job: ACC_GEX.h5mu, search_space.tsv threads: 20 resources: tmpdir=/tmp
2024-10-08 16:01:51,690 SCENIC+ INFO Reading multiome MuData. /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( 2024-10-08 16:01:53,068 SCENIC+ INFO Reading search space 2024-10-08 16:01:53,646 R2G INFO Calculating region to gene importances, using GBM method Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [19:06<00:00, 16.19it/s] 2024-10-08 16:21:06,511 R2G INFO Calculating region to gene correlation, using SR method Running using 20 cores: 0%| | 0/18565 [00:00<?, ?it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 280/18565 [00:00<01:05, 278.20it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 313/18565 [00:00<01:03, 285.64it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג– | 360/18565 [00:01<01:19, 229.80it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 2%|ג–ˆג–‰ | 386/18565 [00:01<01:19, 228.68it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 3%|ג–ˆג–ˆג– | 508/18565 [00:02<01:59, 150.90it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג– | 715/18565 [00:04<03:07, 95.32it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–‰ | 783/18565 [00:05<02:20, 126.41it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 838/18565 [00:05<01:43, 170.62it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) /home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 878/18565 [00:06<03:39, 80.51it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג– | 900/18565 [00:06<02:59, 98.28it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined. correlation_result = np.array([correlator(x, exp) for x in acc.T]) Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [02:44<00:00, 113.04it/s] 2024-10-08 16:24:01,456 R2G INFO Done! 2024-10-08 16:24:01,568 SCENIC+ INFO Saving region to gene adjacencies to region_to_gene_adj.tsv [Tue Oct 8 16:24:07 2024] Finished job 10. 4 of 14 steps (29%) done Select jobs to execute... Execute 1 jobs...
[Tue Oct 8 16:24:07 2024] localrule motif_enrichment_dem: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl output: dem_results.hdf5, dem_results.html jobid: 7 reason: Missing output files: dem_results.hdf5; Input files updated by another job: genome_annotation.tsv threads: 20 resources: tmpdir=/tmp
2024-10-08 16:24:11,789 SCENIC+ INFO Reading region sets from: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets
2024-10-08 16:24:11,789 SCENIC+ INFO Reading all .bed files in: Topics_otsu
2024-10-08 16:24:12,109 SCENIC+ INFO Reading all .bed files in: Topics_top_3k
2024-10-08 16:24:12,194 SCENIC+ INFO Reading all .bed files in: DARs_cell_type
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
r = call_item()
^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in call
return self.fn(*self.args, self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in call
return [func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gilgolan/.local/bin/scenicplus", line 8, in
scenicplus grn_inference motif_enrichment_dem --region_set_folder /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets --dem_db_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather --output_fname_dem_result dem_results.hdf5 --temp_dir /tmp --species homo_sapiens --fraction_overlap_w_dem_database 0.4 --max_bg_regions 500 --genome_annotation genome_annotation.tsv --balance_number_of_promoters --promoter_space 1000 --adjpval_thr 0.05 --log2fc_thr 1.0 --mean_fg_thr 0.0 --motif_hit_thr 3.0 --path_to_motif_annotations /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl --annotation_version v10nr_clust --motif_similarity_fdr 0.001 --orthologous_identity_threshold 0.0 --annotations_to_use Direct_annot Orthology_annot --write_html --output_fname_dem_html dem_results.html --seed 666 --n_cpu 20
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-10-08T155849.153427.snakemake.log WorkflowError: At least one job did not complete successfully."
Hi @gilgolan73
Seems like you ran into same problem as issue #432 ; rolling your python version from 3.11.9 to 3.11.8 should resolve the error.
Hello @kennethho04 , thank you for the quick reply How do you suggest to revert the python version? Do i need to re-install the packages?
Gil
@gilgolan73 doing conda install python==3.11.8
in your conda env should suffice. You don't need to re-install the packages.
hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.
Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt
hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.
Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt
It looks like an issue with parallel processing. Have you tried installing/downgrading dask?
Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)
Thank you, Gil
Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)
Thank you, Gil
I too had faced an issue with parallel processing but as someone suggested in one of the other issues, downgrading it to 2024.5.0 helped. But looks like you are already using that. The python version that I am using is 3.11.10.
@yojetsharma @kennethho04 I tried with python 3.11.8, both with dask version 2024.2.1 and 2024.5.0. still receiving the same error. do you think I need to install python 3.11.10?
Thanks
Does reducing number of cores help? Also, are the region_sets, genome_annotation.tsv are looking fine?
@yojetsharma Hi, I tried to reduce the number of cores to 10, and to 1. It still doesn't help. The files look OK, i'm attaching them. genome_annotation.txt PURK.txt OPC.txt NFOL.txt MOL.txt MGL.txt MG.txt INH_VIP.txt INH_SST.txt INH_SNCG.txt INH_PVALB.txt GP.txt GC.txt ENDO.txt COP.txt BG.txt AST.txt
Gil
My last resort would be to try and reinstalling the conda env and see if it fixes the issue.
Hi @yojetsharma @kennethho04 I tried to reinstall the conda env (both with python 3.11.10 and python 3.11.8), it still did not resolve the issue. Do you have any other suggestions?
Thank you, Gil
@gilgolan73 Did you end up finding a way to resolve your issue? I think I am experiencing a similar issue, where the cistarget motif enrichment is working fine, but DEM does not identify any motifs
Hi @brianysoong, unfortunately I am still stuck with this issue. Tried many different Dask and python versions but the same issue persists. Do you have any suggestions? @yojetsharma @kennethho04
Gil
@gilgolan73 In my case, it ended up being a dumb mistake where I used human genome / annotations for mouse data!
Describe the bug Hello, I'm trying to run SCENIC+ using SnakeMake in a linux machine (centos 9), on the tutorial dataset. I ran scATAC-seq preprocessing in python (using pycistopic, using the tutorial: https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html) Then I ran the scRNAseq preprocessing in python (using the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum_scRNA_pp.html#Preprocessing-the-scRNA-seq-data). I'm using the default config.yml file for SnakeMake, just changed the location of the input data. 1-2 minutes after running SnakeMake (as in the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html#Running-SCENIC+), I receive an error (see below) which I believe is regarding to the different cell names between the scATACseq and scRNAseq datasets. Please let me know how to solve this issue. Thank you!
To Reproduce "snakemake --cores 20"
Error output "Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count
AUCell_direct 1 AUCell_extended 1 all 1 download_genome_annotations 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_cistarget 1 motif_enrichment_dem 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 14
Select jobs to execute... Execute 2 jobs...
[Tue Sep 17 16:12:50 2024] localrule prepare_GEX_ACC_multiome: input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu jobid: 2 reason: Missing output files: ACC_GEX.h5mu resources: tmpdir=/tmp
[Tue Sep 17 16:12:50 2024] localrule download_genome_annotations: output: genome_annotation.tsv, chromsizes.tsv jobid: 8 reason: Missing output files: chromsizes.tsv, genome_annotation.tsv resources: tmpdir=/tmp
2024-09-17 16:12:52,879 SCENIC+ INFO Reading cisTopic object. 2024-09-17 16:12:53,289 SCENIC+ INFO Reading gene expression AnnData. Traceback (most recent call last): File "/home/gilgolan/.local/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 46, in command_prepare_GEX_ACC
prepare_GEX_ACC(
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 96, in prepare_GEX_ACC
mdata = process_multiome_data(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 73, in process_multiome_data
raise Exception(
Exception: No cells found which are present in both assays, check input and consider using
bc_transform_func
! [Tue Sep 17 16:12:53 2024] Error in rule prepare_GEX_ACC_multiome: jobid: 2 input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad output: ACC_GEX.h5mu shell:2024-09-17 16:13:38,876 Download gene annotation INFO Using genome: GRCh38.p14 2024-09-17 16:13:38,878 Download gene annotation INFO Found corresponding genome Id 51 on NCBI 2024-09-17 16:13:39,381 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI 2024-09-17 16:13:39,884 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt
Unhandeled exception occured <urlopen error [Errno 110] Connection timed out> Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned 2024-09-17 16:15:50,496 SCENIC+ INFO Chrosomome sizes was not found, please provide this information manually. 2024-09-17 16:15:50,497 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv Waiting at most 5 seconds for missing files. MissingOutputException in rule download_genome_annotations in file /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scplus_pipeline/Snakemake/workflow/Snakefile, line 221: Job 8 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait: chromsizes.tsv Removing output files of failed job download_genome_annotations since they might be corrupted: genome_annotation.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-09-17T161250.453181.snakemake.log WorkflowError: At least one job did not complete successfully."
Expected behavior I expect SnakeMake to run successfully on the tutorial pre-processed dataset.
Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.
Version (please complete the following information): Python: 3.11.9 SCENIC+: 1.0a1 pyscenic 0.12.1+8.gd2309fe
Additional context When I look at the cell names in the adata object (scRNAseq) and cistopic object (scATACseq) they are different, also I have a different number of cells: "adata.obs Out[216]: VSN_cell_type ... pct_counts_mt CCCTCATAGACACTTA-1 GC ... 0.083148 GCCATTACACCTGCCT-1 ASTP ... 0.132626 ATTGCAGGTTGTGACA-1 MGL ... 1.447368 CTGTTGGAGGCATTAC-1 MOL_B ... 0.095579 CGAATCTAGCTTAGCG-1 MOL_B ... 0.061851 ... ... ... ... TACCTTAGTTACTAGG-1 MOL_B ... 0.058480 GTAGCTGTCATTACAG-1 AST_CER ... 0.188088 AGGCAGGTCGCGACAC-1 MOL_A ... 0.044703 GCATTGCCAAGACTCC-1 MOL_B ... 0.145530 ACATTAGTCCGCAAGC-1 AST_CER ... 0.068552 [2313 rows x 13 columns]
cistopic_obj.cell_data Out[218]: cisTopic_nr_frag ... pycisTopic_cca_Seurat_cell_type CACCTCAGTTGTAAAC-1-10x_multiome_brain 18300 ... AST TGACTCCTCATCCACC-1-10x_multiome_brain 100055 ... BG TTTCTCACATAAACCT-1-10x_multiome_brain 32192 ... GP GTCCTCCCACACAATT-1-10x_multiome_brain 88443 ... BG CTCCGTCCAGTTTGTG-1-10x_multiome_brain 131110 ... ENDO ... ... ... ... GCAGGTTGTCCAAATG-1-10x_multiome_brain 2770 ... MOL AAGCTCCCAGCACCAT-1-10x_multiome_brain 2180 ... MOL CAGAATCTCCTCATGC-1-10x_multiome_brain 1744 ... MOL TAGCCGGGTAACAGGG-1-10x_multiome_brain 2674 ... INH_SNCG GTGCGCAGTGCTTAGA-1-10x_multiome_brain 5859 ... GP [2845 rows x 42 columns]"