Open Jakob37 opened 1 day ago
Weird, perhaps something happened after that PR, I'll look into it today!
Ah but wait, exons did NOT show in MANE report, only on gene report, but I guess that's what you mean?
but I guess that's what you mean?
Not 100% sure what I mean 🤔
But I realized now when looking at some more genes that the exons are there and thus Chanjo2 seems to be doing its job:
Looks like I just was unlucky opening ACBT (screenshot above) which for some reason did not have exons:
Ah good! 😄 Have a great day!
Ah good! 😄 Have a great day!
Looks like something in my db, either on ensembls part, or maybe more likely that my db is out of sync somehow:
MariaDB [chanjo2]> select * from transcripts where ensembl_id="ENST00000414620";
+--------+------------+---------+---------+-----------------+-------------+------------------+--------------+--------------------+---------------------------+-----------------+--------+
| id | chromosome | start | stop | ensembl_id | refseq_mrna | refseq_mrna_pred | refseq_ncrna | refseq_mane_select | refseq_mane_plus_clinical | ensembl_gene_id | build |
+--------+------------+---------+---------+-----------------+-------------+------------------+--------------+--------------------+---------------------------+-----------------+--------+
| 669677 | 7 | 5529282 | 5562790 | ENST00000414620 | NULL | NULL | NULL | NULL | NULL | ENSG00000075624 | GRCh38 |
+--------+------------+---------+---------+-----------------+-------------+------------------+--------------+--------------------+---------------------------+-----------------+--------+
1 row in set (0.030 sec)
MariaDB [chanjo2]> select * from exons WHERE ensembl_transcript_id="ENST00000414620";
Empty set (0.000 sec)
You too 😊
Actually, digging around into this a bit more, things still seem weird, but on a more upstream level (i.e. Schug or ENSEMBL).
Take the two transcripts I mentioned above.
First I thought I might have truncated the exon file which I loaded manually. So I reran the Schug download steps for exons and transcripts.
curl localhost:8037/exons/ensembl_exons/?build=38 > ensembl_exons.tsv
curl localhost:8037/transcripts/ensembl_transcripts/?build=38 > ensembl_transcripts.tsv
These completed fine, with the same md5sums as those I previously loaded into Chanjo2.
Now looking for the transcript IDs above:
$ grep ENST00000414620 ensembl_transcripts.tsv
7 ENSG00000075624 ENST00000414620 5529282 5562790
$ grep ENST00000414620 ensembl_exons.tsv
(no output)
$ grep ENST00000366578 ensembl_transcripts.tsv
1 ENSG00000077522 ENST00000366578 236686499 236764631 NM_001103 NM_001103.4
jakob@laptop:~/data/241127_schug_ensembl$ grep ENST00000366578 ensembl_exons.tsv
1 ENSG00000077522 ENST00000366578 ENSE00003612377 236718894 236719013 1 3
1 ENSG00000077522 ENST00000366578 ENSE00001820573 236686499 236686799 236686499 236686673 1 1
1 ENSG00000077522 ENST00000366578 ENSE00003611529 236717858 236717972 1 2
1 ENSG00000077522 ENST00000366578 ENSE00003535405 236720105 236720191 1 4
1 ENSG00000077522 ENST00000366578 ENSE00003553097 236725933 236726020 1 5
...
Next, I went to ensembl for these transcripts. Here I find exons with exon IDs for both.
ENST00000414620
ENST00000366578
So it seems that there are exons, but I don't get them through schug? Could you check if you have exons for the corresponding transcript?
Mmm I confirm that it might be a schug/Ensembl thing. The exons are available in build 37, but not 38:
Moving this issue to schug then. I'll look into it!
Moving this issue to schug then. I'll look into it!
Thanks 🙏
I've sent the following email to Ensembl, let's see what they reply!
Hello, I'm trying to figure out why we have a bug in our software that downloads transcripts data from Ensembl Biomart (human data).
Specifically, we are missing the 4 exons relative to this transcript: ENST00000414620
The exons are there if you look at the web page: https://www.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000075624;r=7:5529282-5562790;t=ENST00000414620
But are not downloaded using all exons via biomart. URL used in Biomart is the following:
https://www.ensembl.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20%20virtualSchemaName%20=%20%22default%22%20formatter%20=%20%22TSV%22%20header%20=%20%221%22%20uniqueRows%20=%20%220%22%20count%20=%20%22%22%20datasetConfigVersion%20=%20%220.6%22%20completionStamp%20=%20%221%22%3E%3CDataset%20name%20=%20%22hsapiens_gene_ensembl%22%20interface%20=%20%22default%22%20%3E%3CFilter%20name%20=%20%22chromosome_name%22%20value%20=%20%221,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT%22/%3E%3CAttribute%20name%20=%20%22chromosome_name%22%20/%3E%3CAttribute%20name%20=%20%22ensembl_gene_id%22%20/%3E%3CAttribute%20name%20=%20%22ensembl_transcript_id%22%20/%3E%3CAttribute%20name%20=%20%22ensembl_exon_id%22%20/%3E%3CAttribute%20name%20=%20%22exon_chrom_start%22%20/%3E%3CAttribute%20name%20=%20%22exon_chrom_end%22%20/%3E%3CAttribute%20name%20=%20%225_utr_start%22%20/%3E%3CAttribute%20name%20=%20%225_utr_end%22%20/%3E%3CAttribute%20name%20=%20%223_utr_start%22%20/%3E%3CAttribute%20name%20=%20%223_utr_end%22%20/%3E%3CAttribute%20name%20=%20%22strand%22%20/%3E%3CAttribute%20name%20=%20%22rank%22%20/%3E%3C/Dataset%3E%3C/Query%3E
Which is basically using the following attributes:
attributes = [
"chromosome_name",
"ensembl_gene_id",
"ensembl_transcript_id",
"ensembl_exon_id",
"exon_chrom_start",
"exon_chrom_end",
"5_utr_start",
"5_utr_end",
"3_utr_start",
"3_utr_end",
"strand",
"rank",
]
and all chromosomes as filters.
I noticed that when I include the ensembl gene ID (ENSG00000075624) among the filters, then the 4 exons are downloaded
They are also downloaded when I use the Biomart genome build 37.
Thank you so much for your help!
Chiara
Describe the bug
I am test running Chanjo2 in preparation for further demonstrating and discussing (finally, hopefully) putting it into production.
Reports looks fine, and MANE info looks good. But I no longer see any exon information:
Clicking into a gene, it say "no exons stats available for this transcript"
Looking in the MariaDB, it indeed looks like the exons are loaded:
This confuses me a bit, as it apparently worked for me back when reviewing this PR: https://github.com/Clinical-Genomics/chanjo2/pull/369, and I don't think much has changed since then 🤔
Additional context
I have tested this both in Chanjo v2.0 and v2.1. I am running this with Scout v4.90.1.
Might very well be something messed up on our side here. Unsure what though. Debugging pointers are welcome!