Refseq gene list - Githubissues

broadinstitute / ABC-Enhancer-Gene-Prediction

Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)

MIT License

202 stars 61 forks source link

Refseq gene list #61

Open yussufhajjaj opened 3 years ago

yussufhajjaj commented 3 years ago

Dear All,

I want to use the Ensemble gene list instead of refseq with hg38 assembly, is there a way to create a similar file as yours with the TSS being the largest region that contains all possible isoforms of a certain gene? and thanks in advance.

Best Regards,

Yussuf

nttg8100 commented 2 years ago

The key principle that they used is look at what is the most common TSS between serveral transcripts of a gene. Then, it will be the TSS of this one. I try to do the similar thing with same version refseq but can not cover 100% what they have in Chr22 as example. It miss some part due to the fact that there may not have the only largest but may have 2 or even 3. It is a big deal, you have to choose one but they did not mention how to do it next. Hope my comment help you a liitle.

yussufhajjaj commented 2 years ago

Hi Thah,

I already used the Ensemble canonical track, in which they provide one TSS per gene depending on it to be the most conserved and highest expressed transcript for a gene in the different tissues. It works well for me, you might need to remove some gene types like pseudogenes and etc. It was nice from you to remind to drop a comment here, I totally forgot to update the issue. Thanks.

jxcao98 commented 10 months ago

Hi, In the canonical track, is the ISS position the start position of the transcript labeled as Ensembl_canonical in the gtf file?