aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements #490

Closed JinKyu-Cheong closed 4 weeks ago

JinKyu-Cheong commented 4 weeks ago

When running the SCENIC+ pipeline, I encountered a ValueError related to column mismatches during processing. The issue appears to be caused by how the region_names_to_coordinates function is handling the input regions, particularly when reading BED files, leading to an unexpected format that disrupts downstream processing. This error occurs during the second step of the Snakemake pipeline, specifically in the motif_enrichment_cistarget step.

This package is really impressive, and I’m excited to use it for my paper. However, I’ve been struggling for months to get it running, and even our core bioinformatician was unable to resolve all the issues. I would greatly appreciate any help you can provide. Thank you!

versions

PyRanges version: 0.0.111 SCENIC+ version: 1.0a1 Python 3.11.8

error message

Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job stats:
job                            count
---------------------------  -------
AUCell_direct                      1
AUCell_extended                    1
all                                1
download_genome_annotations        1
eGRN_direct                        1
eGRN_extended                      1
get_search_space                   1
motif_enrichment_cistarget         1
motif_enrichment_dem               1
prepare_menr                       1
region_to_gene                     1
scplus_mudata                      1
tf_to_gene                         1
total                             13

Select jobs to execute...
Execute 1 jobs...

[Wed Oct 23 17:18:49 2024]
localrule motif_enrichment_cistarget:
    input: /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets, /lila/data/niecr/cheongj/ibd/scenicplus/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather, /data/niecr/cheongj/ibd/scenicplus/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
    output: /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.hdf5, /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.html
    jobid: 8
    reason: Missing output files: /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.hdf5
    threads: 2
    resources: tmpdir=/scratch/lsftmp/10095877.tmpdir

2024-10-23 17:19:29,521 SCENIC+      INFO     Reading region sets from: /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets
2024-10-23 17:19:29,526 SCENIC+      INFO     Reading all .bed files in: test
2024-10-23 17:19:57,880 cisTarget    INFO     Reading cisTarget database
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
        ^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 589, in __call__
    return [func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 589, in <listcomp>
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 131, in _run_cistarget_single_region_set
    ctx_db = cisTargetDatabase(
             ^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py", line 55, in __init__
    self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
                                                               ^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py", line 111, in load_db
    target_to_db = target_to_query(region_sets, list(db_regions), fraction_overlap = fraction_overlap)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py", line 280, in target_to_query
    query_pr=pr.PyRanges(region_names_to_coordinates(query))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py", line 35, in region_names_to_coordinates
    regiondf.columns=['Chromosome', 'Start', 'End']
    ^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in __setattr__
    return object.__setattr__(self, name, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/bin/scenicplus", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
    args.func(args)
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 388, in motif_enrichment_cistarget
    run_motif_enrichment_cistarget(
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 242, in run_motif_enrichment_cistarget
    cistarget_results: List[cisTarget] = joblib.Parallel(
                                         ^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs
    yield from self._retrieve()
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve
    self._raise_error_fast()
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise
    raise self._result
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
[Wed Oct 23 17:20:00 2024]
Error in rule motif_enrichment_cistarget:
    jobid: 8
    input: /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets, /lila/data/niecr/cheongj/ibd/scenicplus/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather, /data/niecr/cheongj/ibd/scenicplus/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
    output: /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.hdf5, /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.html
    shell:

            scenicplus grn_inference motif_enrichment_cistarget                 --region_set_folder /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets                 --cistarget_db_fname /lila/data/niecr/cheongj/ibd/scenicplus/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather                 --output_fname_cistarget_result /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.hdf5                 --temp_dir /scratch/cheongj/tmp                 --species homo_sapiens                 --fr_overlap_w_ctx_db 0.4                 --auc_threshold 0.005                 --nes_threshold 3.0                 --rank_threshold 0.05                 --path_to_motif_annotations /data/niecr/cheongj/ibd/scenicplus/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl                 --annotation_version v10nr_clust                 --motif_similarity_fdr 0.001                 --orthologous_identity_threshold 0.0                 --annotations_to_use Direct_annot Orthology_annot                 --write_html                 --output_fname_cistarget_html /data/niecr/cheongj/ibd/scenicplus/outs/ctx_results.html                 --n_cpu 2

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-23T171849.394334.snakemake.log
WorkflowError:
At least one job did not complete successfully.

Bed files in the region set directory doesn't seem to be the problem.

!head -n 5 /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets/*/*.bed

==> /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets/DARs_cell_type/B.bed <==
chr1    1032928 1033428
chr1    1038661 1039161
chr1    1068541 1069041
chr1    1136466 1136966
chr1    1137208 1137708

==> /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets/DARs_cell_type/CD14_M.bed <==
chr1    817092  817592
chr1    906699  907199
chr1    1038661 1039161
chr1    1686180 1686680
chr1    1764174 1764674

==> /lila/data/niecr/cheongj/ibd/scenicplus/output/region_sets/DARs_cell_type/CD16_M.bed <==
chr1    817092  817592
chr1    897220  897720
chr1    906699  907199
chr1    1686180 1686680
chr1    1764174 1764674

I did..

  1. downgrading or upgrading python. 3.8 or 3.11
  2. re-establishing scenicplus environment in python3.11
  3. adding headers to bed files.