FunctionLab / ExPecto

predicting expression effects of human genome variants ab initio from sequence
118 stars 41 forks source link

The representative TSS site #21

Open ivyhzau opened 4 years ago

ivyhzau commented 4 years ago

I'm interested in the method to get the representative TSS site you provided in "Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk". In the paper it mentioned "took the TSS position reported for the CAGE peak as the selected representative TSS for the gene". So I wonder if the gene has only one transcript, the representative TSS site is the same with the gene start site. But it seems not true when I check the geneanno.csv file. Take "ENSG00000169717" as an example, the CAGE_representative_TSS site (2938068) is different with TSS site (2938047). Maybe I have misunderstood something. @jzthree

jzthree commented 4 years ago

Thanks for the question. The representative TSS site positions are determined from FANTOM CAGE data which measures 5' end more precisely, so yes it can be different from the gene start site even if there is only one transcript.