davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
202 stars 113 forks source link

ENSG id and gene name not matching #240

Open seyoun209 opened 1 year ago

seyoun209 commented 1 year ago

Dear Leafcutter team,

I have realized that in the differential leafcutter, data xxx.RData output file, the same gene name appears to be correct along with the coordinates, but the ENSGID does not match. For example: 91218 clu27247+ LIMCH1 ENSG00000064042.18 chr4 41551191 41551321 annotated 0.001 91219 clu27247+ LIMCH1 ENSG00000064042.18 chr4 41551395 41598920 annotated -0.010 111731 clu33468+ PTPN12 ENSG00000064042.18 chr7 77571186 77581427 annotated -0.005 111732 clu33468+ PTPN12 ENSG00000064042.18 chr7 77571186 77585543 novel annotated pair 0.004 111733 clu33468+ PTPN12 ENSG00000064042.18 chr7 77581503 77583555 annotated 0.002 77657 clu23215+ CD40 ENSG00000064042.18 chr20 46123219 46128138 annotated 0.022 77658 clu23215+ CD40 ENSG00000064042.18 chr20 46126701 46128138 annotated -0.022 77659 clu23215+ CD40 ENSG00000064042.18 chr20 46126741 46128138 cryptic_fiveprime -0.001 77660 clu23215+ CD40 ENSG00000064042.18 chr20 46127289 46128138 annotated 0.001 68841 clu20602- METTL8 ENSG00000064042.18 chr2 171325906 171326042 annotated 0.024 68842 clu20602- METTL8 ENSG00000064042.18 chr2 171325906 171330559 annotated -0.039 8791 clu3411+ MGST3 ENSG00000064042.18 chr1 165635822 165649841 annotated -0.002 119649 clu35805+ PTGS1 ENSG00000064042.18 chr9 122371272 122377899 annotated -0.044 119650 clu35805+ PTGS1 ENSG00000064042.18 chr9 122371834 122377899 annotated 0.049

I also have checked in reference file for leafcutter bed file: CD40 ENSG is ENSG00000101017.14 , METTL8 is ENSG00000123600.19, MGST3 is ENSG00000143198.13 and also PTGS1 is ENSG00000095303.17. So I don't think reference file has a problem but not sure. Have you faced a similar situation, and if so, do you have any suggestions on how to address it? Thank you in advance for your help.

jackhump commented 12 months ago

Hi there,

Try regenerating the annotations from the same GTF you used in alignment using gtf2leafcutter.pl

viljabio commented 3 weeks ago

Hi,

I am facing this exact same issue. The ENSG ids are incorrect and one ENSG id is often given for multiple genes in different chromosomes. I also tried regenerating the annotations from the same GTF used for the alignment but it did not fix the issue. I have also checked that in the bed reference files the ENSG ids, gene names and genomic locations are correct.

Were you able to solve this issue @seyoun209 or do you have other ideas @jackhump what could be the root cause for this issue?

Thank you in advance for your help!

seyoun209 commented 3 weeks ago

I tried to regenerate using the gtf2leafcutter.but regeneration alone didn't fix the problem (I used to use the gencode.v34). So, I downloaded the gencode.v45 and regenerated it with the new reference genome, which fixed the problem. I'm not sure the version matters, but at least my case helped!