aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Genes not found in cistarget database #21

Closed daxus4 closed 2 years ago

daxus4 commented 2 years ago

Hi!

I'm using the pySCENIC pipeline with a single cell dataset from a paper. I created the cistarget database and everything seems ok.

There is only one problem. When I run pyscenic ctx, with the cistarget database created by myself, for some modules I get:

2022-04-11 10:56:29,454 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for HBP1 could be mapped to Fetal_Brain_Chromatin_Atlas_GZ_GimmeMotifs_up3000_down3000_no_geschwind_genes.regions_vs_motifs.rankings. Skipping this module.

I checked and I found that in the single cell database there are some genes that are not present in my cistarget database. Since I did not find a file that I can use to create the cistarget database with all the genes, I thought two solutions:

  1. I can add those genes as the last ranking genes for each motif in my cistarget database
  2. I can add those genes, with the sequence "NNNNN...", to the fasta file that I use to create the cistarget database and then create the cistarget database

Are these solutions right? Which one is the best?

Thank you!

ghuls commented 2 years ago

Those genes never would be enriched in the analysis, so it does not make much sense to add them (as they won't be reported in later steps anyway). Both solutions would result in the same result (and should also be similar as not adding those genes at all).

daxus4 commented 2 years ago

Thank you for the rapid and significative answer ghuls!