integrateData.py failing to match geneIDs

SchulzLab / TEPIC

Annotation of genomic regions using transcription factor binding sites and epigenetic data

MIT License

40 stars 9 forks source link

integrateData.py failing to match geneIDs #20

Closed LRizzardi1 closed 7 years ago

LRizzardi1 commented 7 years ago

Just curious if I've missed something....trying to run combined TEPIC/DYNAMITE pipeline and getting 0 overlapping geneIDs between TF affinities and my expression file. I have provided the ENSEMBL gene IDs in the expression file and it appears the Ratio_Affinities_group1_vs_group2.txt file has gene symbols (or what is in the JASPAR, etc databases). Is there another step somewhere to match these gene names appropriately? Thanks!

Florian411 commented 7 years ago

Hey, There are multiple reasons: (1) The ENSEMBL gene IDs should be fine in general. What genome annotation file are you using? Make sure that the ENSEMBL IDs there are in the same format as they are in the expression file, e.g. without the version dot (ENSG00000186092 is fine, ENSG00000186092.3 is not). JASPAR is just used for the PSEM, not for the gene annotation, here we indeed use gene symbols for the TFs.

(2) It might be of course that there is no overlap, have you checked that manually?

LRizzardi1 commented 7 years ago

Ah....resolved it! Accidentally left rownames in expression table. Thanks!

Florian411 commented 7 years ago

All right! Great. You are welcome.