aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
185 stars 29 forks source link

Length mismatch when running get_search_space #426

Closed ytrink closed 3 months ago

ytrink commented 4 months ago

Hi, running the development version. I edited the config and workflow files to skip download genome annotations. My chromsome sizes file is in the correct format (Chromosme Start End) <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Chromosome | Start | End -- | -- | -- chr1 | 0 | 2.49E+08 chr2 | 0 | 2.42E+08 chr3 | 0 | 1.98E+08 chr4 | 0 | 1.9E+08 chr5 | 0 | 1.82E+08 chr6 | 0 | 1.71E+08 chr7 | 0 | 1.59E+08 chr8 | 0 | 1.45E+08 chr9 | 0 | 1.38E+08 chr10 | 0 | 1.34E+08 chr11 | 0 | 1.35E+08 chr12 | 0 | 1.33E+08

etc etc.... However I am getting the following error when running "get_search_space"

---------
File "xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 447, in get_search_space_command
    search_space = get_search_space(
                   ^^^^^^^^^^^^^^^^^
  File "xxxxxxxxxxxx/mambaforge/envs/scenicplus_dev/lib/python3.11/site-packages/scenicplus/data_wrangling/gene_search_space.py", line 294, in get_search_space
    pr_regions = pr.PyRanges(region_names_to_coordinates(scplus_region))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/scenicplus/utils.py", line 223, in region_names_to_coordinates
    regiondf.columns = ['Chromosome', 'Start', 'End']
    ^^^^^^^^^^^^^^^^
  File "xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in __setattr__
    return object.__setattr__(self, name, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis
    self._validate_set_axis(axis, new_labels)
  File xxxxxxxxxxxx/envs/scenicplus_dev/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
**ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements**

Do you have any advice? Thanks Yaron

SeppeDeWinter commented 4 months ago

Hi @ytrink

Not sure how, but for some reason you region names are either empty or not well formatted. Can you validate this by showing


import mudata

mdata = mudata.read(<PATH_TO_MUDATA.h5mu>)
mdata["scATAC"].var_names

Best,

Seppe

ytrink commented 4 months ago

Hi, thanks for the response. I don't have a mudata file, I guess I am using an older version (when I last worked on this several months ago I downloaded the development version).

Here are what my region sets look like for example (top3k):

      chrom  chromStart   chromEnd

0 KI270726.1 27118 28023 1 chr1 1063814 1064710 2 chr1 1137011 1137937 3 chr1 1143808 1144701 4 chr1 1171613 1172500 ... ... ... ... 2995 chrX 136619684 136620610 2996 chrX 136674274 136675211 2997 chrX 136694401 136695238 2998 chrX 149520949 149521857 2999 chrX 153926491 153927267

I ran pycistopic directly on the counts matrix, the region names look like this:

chr1-9734-10738
chr1-28896-29762 ..........

I am running on non-multiome data but from the same sample. Thank you, Yaron

SeppeDeWinter commented 4 months ago

Hi @ytrink

I guess the region names for pycisTopic are the issue. They should have the following format "chr:start-end". Also, if you are using the development version and the Snakemake pipeline a mudata object should be generated during the workflow.

Best,

Seppe

ytrink commented 3 months ago

Thanks @SeppeDeWinter that seems to have fixed the problem.