Closed chansigit closed 2 years ago
And can you please further elaborate on what is the genome ranking of the motifs? I read the SCENIC/pySCENIC paper and found little clues.
Thank you
The feather files are used for the motif enrichment analysis part (i.e. pruning step).
These files contain a matrix with one axis representing genes and the other axis representing motifs, the values in this matrix are rankings for each motif across the genes.
To generate this matrix first genomic regions surrounding each gene are gathered. To do so, all non-coding regions located in the neighbourhood of a gene will be assigned to genes. These regions include the promoter regions upstream and downstream to the transcription start site (TSS). The search space around each gene is set to 20 kb around the TSS, for human and mouse.
Next, each region is scored for motifs using cluster-buster (https://github.com/weng-lab/cluster-buster). Because each gene can have multiple regions (and thus multiple CRM-scores) we take the max of the CRM-scores over all regions linked to the individual genes and assign this max score as the motif score for each gene.
Finally, a ranking is generated for each motif across all genes based on these (max) CRM-scores.
We have a seperate github repository which contains functions to generate such a database, see https://github.com/aertslab/create_cisTarget_databases
To read these files you have to use a Feather reader, for example: pandas.read_feather
, however these files are quite big thus this can take a long time (and a lot of memory).
For more info you can also read:
Imrichová,H., Hulselmans,G., Kalender Atak,Z., Potier,D. and Aerts,S. (2015) i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. doi: 10.1093/nar/gkv395
Herrmann,C., Van de Sande,B., Potier,D. and Aerts,S. (2012) i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. doi: 10.1093/nar/gks543
Does this answer your questions?
Closing issue due to inactivity, feel free to open again if you have further questions.
I am curious what are there inside a feather file from the cistarget database? Can you please explain what are there inside, and how we can read it?