aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

ValueError: A gene signature must have at least one gene. #182

Open Citugulia40 opened 11 months ago

Citugulia40 commented 11 months ago

Hi,

Thanks for Scenic+.

I am running the Scenic+ and getting an error on my data as well as the pbmc data after running

`from scenicplus.wrappers.run_pycistarget import run_pycistarget run_pycistarget( region_sets = region_sets, species = 'homo_sapiens', save_path = os.path.join(work_dir, 'motifs'), ctx_db_path = rankings_db, dem_db_path = scores_db, path_to_motif_annotations = motif_annotation, run_without_promoters = True, n_cpu = 1,

_temp_dir = os.path.join(tmp_dir, 'ray_spill'),

annotation_version = 'v10nr_clust',
)`

2023-07-24 20:07:12,591 pycisTarget_wrapper INFO pbmc_tutorial/motifs/DEM_topics_top_3_No_promoters folder already exists. 2023-07-24 20:07:12,772 pycisTarget_wrapper INFO Loading cisTarget database for DARs 2023-07-24 20:07:12,773 cisTarget INFO Reading cisTarget database

`ValueError                                Traceback (most recent call last)
Cell In[36], line 2
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
      3     region_sets = region_sets,
      4     species = 'homo_sapiens',
      5     save_path = os.path.join(work_dir, 'motifs'),
      6     ctx_db_path = rankings_db,
      7     dem_db_path = scores_db,
      8     path_to_motif_annotations = motif_annotation,
      9     run_without_promoters = True,
     10     n_cpu = 1,
     11     #_temp_dir = os.path.join(tmp_dir, 'ray_spill'),
     12     annotation_version = 'v10nr_clust',
     13     )

File /data2/ccitu/software/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    180 ## CISTARGET
    181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    183 if exclude_motifs is not None:
    184     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     48 def __init__(self, 
     49             fname: str,
     50             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     51             name: str = None,
     52             fraction_overlap: float = 0.4):
     53     """
     54     Initialize cisTargetDatabase
     55     
   (...)
     65         Minimal overlap between query and regions in the database for the mapping.     
     66     """
---> 67     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     68                                                       region_sets,
     69                                                       name,
     70                                                       fraction_overlap)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:131, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    129 if prefix is not None:
    130     target_regions_in_db = [prefix + '__' + x for x in target_regions_in_db]
--> 131 target_regions_in_db = GeneSignature(name=name, gene2weight=target_regions_in_db)
    132 db_rankings = db.load(target_regions_in_db)
    133 if prefix is not None:

File <attrs generated init ctxcore.genesig.GeneSignature>:7, in __init__(self, name, gene2weight)
      5 if _config._run_validators is True:
      6     __attr_validator_name(self, __attr_name, self.name)
----> 7     __attr_validator_gene2weight(self, __attr_gene2weight, self.gene2weight)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ctxcore/genesig.py:172, in GeneSignature.gene2weight_validator(self, attribute, value)
    169 @gene2weight.validator
    170 def gene2weight_validator(self, attribute, value) -> None:
    171     if len(value) == 0:
--> 172         raise ValueError("A gene signature must have at least one gene.")

ValueError: A gene signature must have at least one gene.`

Please help me in solving this.

Thanks in advance.

SeppeDeWinter commented 11 months ago

Hi @Citugulia40

Thanks for opening an issue. It looks like the regions in your region sets are not overlapping with the cistarget database you are using.

Which database are you using? Did you create a custom one based on your dataset?

Can you show the output of?:

region_sets

Best,

Seppe

Citugulia40 commented 11 months ago

The region_setsoutput:

{'topics_otsu': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrX | 51341387 | 51341887 | | chrX | 63760345 | 63760845 | | chrX | 148602031 | 148602531 | | chrX | 129577059 | 129577559 | | chrY | 19701689 | 19702189 | +--------------+-----------+-----------+ Unstranded PyRanges object has 256 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 17736981 | 17737481 | | chrX | 23743058 | 23743558 | | chrX | 46836672 | 46837172 | | chrX | 47217776 | 47218276 | | chrY | 13479857 | 13480357 | | chrY | 19567021 | 19567521 | | chrY | 2935735 | 2936235 | +--------------+-----------+-----------+ Unstranded PyRanges object has 6,465 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'topics_top_3': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrY | 14839383 | 14839883 | | chrY | 14518523 | 14519023 | | chrY | 6601166 | 6601666 | | chrY | 20575611 | 20576111 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,677 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 16786133 | 16786633 | | chrX | 10014954 | 10015454 | | chrX | 18425128 | 18425628 | | chrX | 149505021 | 149505521 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,829 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {}}

I am taking the databases from

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/

.tbl file from

https://resources.aertslab.org/cistarget/motif2tf/

Thank you so much for your help.

SeppeDeWinter commented 11 months ago

Hi @Citugulia40

The reason for the error is that you don't have any regions for "DARs", see


'DARs': {}

Best,

Seppe

skoturan commented 4 months ago

Hi Seppe, I have the same issue with the dataset I'm analyzing currently. I can't detect any DARs- I tweaked some of the QC filters to get some signal. But still nothing. Does that simply mean that the landscape is not different? Do you have any thoughts on why this can happen?

SeppeDeWinter commented 4 months ago

HI @skoturan

It's difficult to answer this question on a general basis without any more information. I might be able to help if you provide some more context (with example outputs etc).

All the best,

Seppe