Open yussufhajjaj opened 3 years ago
The key principle that they used is look at what is the most common TSS between serveral transcripts of a gene. Then, it will be the TSS of this one. I try to do the similar thing with same version refseq but can not cover 100% what they have in Chr22 as example. It miss some part due to the fact that there may not have the only largest but may have 2 or even 3. It is a big deal, you have to choose one but they did not mention how to do it next. Hope my comment help you a liitle.
Hi Thah,
I already used the Ensemble canonical track, in which they provide one TSS per gene depending on it to be the most conserved and highest expressed transcript for a gene in the different tissues. It works well for me, you might need to remove some gene types like pseudogenes and etc. It was nice from you to remind to drop a comment here, I totally forgot to update the issue. Thanks.
Hi, In the canonical track, is the ISS position the start position of the transcript labeled as Ensembl_canonical in the gtf file?
Dear All,
I want to use the Ensemble gene list instead of refseq with hg38 assembly, is there a way to create a similar file as yours with the TSS being the largest region that contains all possible isoforms of a certain gene? and thanks in advance.
Best Regards,
Yussuf