Error in rule get_search_space: jobid: 11 second

vcleon88 commented 4 weeks ago

I created my own chromsize.tsv: it looks like Chromosome Start End 0 chr1 0 248956422 1 chr2 0 242193529 2 chr3 0 198295559 3 chr4 0 190214555 4 chr5 0 181538259 Index(['Chromosome', 'Start', 'End'], dtype='object')

and the genomo_annotataion.tsv with filtered chromosome be like: Chromosome Start End Strand Gene Transcription_Start_Site \ 0 chrM 3307 4262 + MT-ND1 3307
1 chrM 4470 5511 + MT-ND2 4470
2 chrM 5904 7445 + MT-CO1 5904
3 chrM 7586 8269 + MT-CO2 7586
4 chrM 8366 8572 + MT-ATP8 8366

Transcript_type
0 protein_coding
1 protein_coding
2 protein_coding
3 protein_coding
4 protein_coding
Index(['Chromosome', 'Start', 'End', 'Strand', 'Gene', 'Transcription_Start_Site', 'Transcript_type'], dtype='object')

however when i run the Snakemake the error comes again

:~/scplus_pipeline/Snakemake$ Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job stats: job count

AUCell_direct 1 AUCell_extended 1 all 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_dem 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 11

Select jobs to execute... Execute 1 jobs...

[Thu Oct 24 15:57:03 2024] localrule get_search_space: input: /home/gu/scecis/plusout/ACC_GEX.h5mu, /home/gu/scecis/plusout/genome_annotation.tsv, /home/gu/scecis/plusout/chromsizes.tsv output: /home/gu/scecis/plusout/search_space.tsv jobid: 11 reason: Missing output files: /home/gu/scecis/plusout/search_space.tsv resources: tmpdir=/tmp

2024-10-24 15:57:08,201 SCENIC+ INFO Reading data (scenicplus) gu@s166:~/scplus_pipeline/Snakemake$ /home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( /home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024. warnings.warn( Traceback (most recent call last): File "/home/gu/miniconda3/envs/scenicplus/bin/scenicplus", line 8, in sys.exit(main()) ^^^^^^ File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main args.func(args) File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 208, in search_space get_search_space_command( File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 661, in get_search_space_command search_space = get_search_space( ^^^^^^^^^^^^^^^^^ File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/data_wrangling/gene_search_space.py", line 294, in get_search_space pr_regions = pr.PyRanges(region_names_to_coordinates(scplus_region)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/utils.py", line 223, in region_names_to_coordinates regiondf.columns = ['Chromosome', 'Start', 'End'] ^^^^^^^^^^^^^^^^ File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in setattr return object.setattr(self, name, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis self._mgr.set_axis(axis, labels) File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis self._validate_set_axis(axis, new_labels) File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements [Thu Oct 24 15:57:18 2024] Error in rule get_search_space: jobid: 11 input: /home/gu/scecis/plusout/ACC_GEX.h5mu, /home/gu/scecis/plusout/genome_annotation.tsv, /home/gu/scecis/plusout/chromsizes.tsv output: /home/gu/scecis/plusout/search_space.tsv shell:

    scenicplus prepare_data search_spance             --multiome_mudata_fname /home/gu/scecis/plusout/ACC_GEX.h5mu             --gene_annotation_fname /home/gu/scecis/plusout/genome_annotation.tsv             --chromsizes_fname /home/gu/scecis/plusout/chromsizes.tsv             --out_fname /home/gu/scecis/plusout/search_space.tsv             --upstream 1000 150000             --downstream 1000 150000             --extend_tss 10 10

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-10-24T155703.261548.snakemake.log WorkflowError: At least one job did not complete successfully.

is there anyone have idea of this issue ？

Thanks in advance.

kennethho04 commented 3 weeks ago

Hi @vcleon88

I got the same issue and was able to resolve it. Your problem is similar to what was mentioned in issue #426

The problem is likely due to the format of your mudata var.names. You can check the format by running the following:

import mudata

mdata = mudata.read(<PATH_TO_ ACC_GEX.h5mu>)
mdata["scATAC"].var_names

The format should be "chr:start-end". In my case it was formatted as "chr-start-end" so I reformatted mdata["scATAC"].var_names to "chr:start-end" and saved it as a new mudata to replace the old one in my out folder. Hope that helps.

vcleon88 commented 2 weeks ago

Hi @vcleon88

I got the same issue and was able to resolve it. Your problem is similar to what was mentioned in issue #426

The problem is likely due to the format of your mudata var.names. You can check the format by running the following:
import mudata

mdata = mudata.read(<PATH_TO_ ACC_GEX.h5mu>)
mdata["scATAC"].var_names
The format should be "chr:start-end". In my case it was formatted as "chr-start-end" so I reformatted mdata["scATAC"].var_names to "chr:start-end" and saved it as a new mudata to replace the old one in my out folder. Hope that helps.

Hi @kennethho04

Thank you so much!!!! I solved this problem!!!

aertslab / scenicplus

Error in rule get_search_space: jobid: 11 second #492