aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

pycisTarget - TypeError: object of type 'int' has no len() #293

Closed simozhou closed 5 months ago

simozhou commented 5 months ago

Describe the bug I am running a standard SCENIC+ pipeline on Drosophila data, and the pycisTarget step gives an error as soon as I arrive at the DARs stage.

To Reproduce

from scenicplus.wrappers.run_pycistarget import *
run_pycistarget(region_sets = region_sets,
                 ctx_db_path = rankings_db,
                 species = 'drosophila_melanogaster',
                 save_path = os.path.join(work_dir, 'motifs'),
                 dem_db_path = scores_db,
                 run_without_promoters = True,
                 path_to_motif_annotations = motif_annotation,
                 annotation_version = 'v10nr_clust',
                 n_cpu = 1,
                 _temp_dir = '/scratch/procaccia/tmp/ray_spill')

Error output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/procaccia/scenicplus_docker/scenicplus/src/scenicplus/wrappers/run_pycistarget.py", line 191, in run_pycistarget
    menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 543, in run_cistarget
    ctx_dict = [ctx_internal(ctx_db = ctx_db, 
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 543, in <listcomp>
    ctx_dict = [ctx_internal(ctx_db = ctx_db, 
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 707, in ctx_internal
    ctx_result.run_ctx(ctx_db)
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 304, in run_ctx
    region_set_signature = region_sets_to_signature(self.regions_to_db['Query'].tolist(), region_set_name = self.name)
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/utils.py", line 55, in region_sets_to_signature
    signature = Regulon(
  File "<attrs generated init ctxcore.genesig.Regulon>", line 15, in __init__
  File "/g/furlong/procaccia/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ctxcore/genesig.py", line 166, in name_validator
    if len(value) == 0:
TypeError: object of type 'int' has no len()

Version (please complete the following information):

Additional information A piece of the region_sets dictionary, containing only DARs for testing (all clusters have a non-empty set of DARs):

>>> region_sets
{'DARs': {0: +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr2L        | 15431272  | 15431451  |
| chr2L        | 11458511  | 11458749  |
| chr2L        | 11458227  | 11458454  |
| chr2L        | 5105114   | 5105457   |
| ...          | ...       | ...       |
| chr4         | 485989    | 486164    |
| chr4         | 760435    | 760658    |
| chr4         | 780687    | 780966    |
| chr4         | 333012    | 333216    |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,105 rows and 3 columns from 5 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 1: +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr2L        | 16785240  | 16785420  |
| chr2L        | 16785653  | 16785857  |
| chr2L        | 9550955   | 9551105   |
| chr2L        | 6772735   | 6772989   |
| ...          | ...       | ...       |
| chr4         | 90208     | 90447     |
| chr4         | 937179    | 937406    |
| chr4         | 692054    | 692310    |
| chr4         | 253263    | 253453    |
+--------------+-----------+-----------+
Unstranded PyRanges object has 5,402 rows and 3 columns from 5 chromosomes.
[ . . .]
SeppeDeWinter commented 5 months ago

Hi @simozhou

This error is raised because the keys of your DARs are integers, and not strings (e.g. 0, 1, ...). Formatting them as strings instead should solve your issue.

I pushed this change to pycistarget, now it should raise a better error message: https://github.com/aertslab/pycistarget/commit/81eb8757ec032a25db53b1952b8529064470e19f.

All the best,

Seppe