Open dianamosa opened 1 month ago
Hi @dianamosa
Thanks for reaching out! I tried here with the following results:
#provider: bed2gtf
#version: 1.9.2
#contact: https://github.com/alejandrogzi/bed2gtf
#date: 2024-7-12
mirAng1_1 bed2gtf gene 12119960 12159135 . - . gene_id "GeneA";
mirAng1_1 bed2gtf transcript 12119960 12159135 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28";
mirAng1_1 bed2gtf exon 12119960 12120095 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "7"; exon_id "ENST00000381995.28.7";
mirAng1_1 bed2gtf CDS 12119960 12120095 . - 0 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "7"; exon_id "ENST00000381995.28.7";
mirAng1_1 bed2gtf exon 12122789 12122946 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "6"; exon_id "ENST00000381995.28.6";
mirAng1_1 bed2gtf CDS 12122789 12122946 . - 2 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "6"; exon_id "ENST00000381995.28.6";
mirAng1_1 bed2gtf exon 12123332 12123494 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "5"; exon_id "ENST00000381995.28.5";
mirAng1_1 bed2gtf CDS 12123332 12123494 . - 0 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "5"; exon_id "ENST00000381995.28.5";
mirAng1_1 bed2gtf exon 12127340 12127485 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "4"; exon_id "ENST00000381995.28.4";
mirAng1_1 bed2gtf CDS 12127340 12127485 . - 2 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "4"; exon_id "ENST00000381995.28.4";
mirAng1_1 bed2gtf exon 12131084 12131286 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "3"; exon_id "ENST00000381995.28.3";
mirAng1_1 bed2gtf CDS 12131084 12131286 . - 1 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "3"; exon_id "ENST00000381995.28.3";
mirAng1_1 bed2gtf exon 12137432 12137564 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "2"; exon_id "ENST00000381995.28.2";
mirAng1_1 bed2gtf CDS 12137432 12137564 . - 2 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "2"; exon_id "ENST00000381995.28.2";
mirAng1_1 bed2gtf exon 12159051 12159135 . - . gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "1"; exon_id "ENST00000381995.28.1";
mirAng1_1 bed2gtf CDS 12159051 12159135 . - 0 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "1"; exon_id "ENST00000381995.28.1";
mirAng1_1 bed2gtf start_codon 12159133 12159135 . - 0 gene_id "GeneA"; transcript_id "ENST00000381995.28"; exon_number "1"; exon_id "ENST00000381995.28.1";
gene coordinates are ok. I am not quite sure why that is happening with your data.
Also, I noticed that not all the transcripts have a gene feature, could you please explain to me how the gene feature is assigned? I wonder if this is an issue with the pipeline or my evidence files.
This is totally unexpected. Could you please share your files so I can have a look and try to solve it here?
Additionally, there is this project I developed few months ago: https://github.com/alejandrogzi/postoga, pretty useful if you're working with TOGA and want to automate things and filtering, conversions and other stuff.
Alejandro
Hello Alejandro
Thank you for developing this tool to convert the TOGA bed output to GTF. I need a GTF file to perform several downstream analyses, so this tool is exactly what I need! I appreciate your effort for creating this. I annotated a mammal genome using TOGA and used your tool to get the GTF file, but I noticed the gene coordinates are weird. The exon and CDS coordinates are correct, but I observed that the gene feature coordinates are not assigned correctly see my example below:
The original bed file:
mirAng1_1 12119959 12159135 ENST00000381995.28 1000 - 12119959 12159135 0,0,200 7 136,158,163,146,203,133,85, 0,2829,3372,7380,11124,17472,39091,
The GTF converted file is:mirAng1_1 bed2gtf gene 9847 12159135 . - . gene_id "ENSG00000159128"; mirAng1_1 bed2gtf transcript 12119960 12159135 . - . gene_id "ENSG00000159128"; mirAng1_1 bed2gtf exon 12159051 12159135 . - . gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.28.1"; mirAng1_1 bed2gtf CDS 12159051 12159135 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.28.1"; mirAng1_1 bed2gtf start_codon 12159133 12159135 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.28.1"; mirAng1_1 bed2gtf exon 12137432 12137564 . - . gene_id "ENSG00000159128"; exon_number "2"; transcript_id "ENST00000381995.28.2"; mirAng1_1 bed2gtf CDS 12137432 12137564 . - 2 gene_id "ENSG00000159128"; exon_number "2"; transcript_id "ENST00000381995.28.2"; mirAng1_1 bed2gtf exon 12131084 12131286 . - . gene_id "ENSG00000159128"; exon_number "3"; transcript_id "ENST00000381995.28.3"; mirAng1_1 bed2gtf CDS 12131084 12131286 . - 1 gene_id "ENSG00000159128"; exon_number "3"; transcript_id "ENST00000381995.28.3"; mirAng1_1 bed2gtf exon 12127340 12127485 . - . gene_id "ENSG00000159128"; exon_number "4"; transcript_id "ENST00000381995.28.4"; mirAng1_1 bed2gtf CDS 12127340 12127485 . - 2 gene_id "ENSG00000159128"; exon_number "4"; transcript_id "ENST00000381995.28.4"; mirAng1_1 bed2gtf exon 12123332 12123494 . - . gene_id "ENSG00000159128"; exon_number "5"; transcript_id "ENST00000381995.28.5"; mirAng1_1 bed2gtf CDS 12123332 12123494 . - 0 gene_id "ENSG00000159128"; exon_number "5"; transcript_id "ENST00000381995.28.5"; mirAng1_1 bed2gtf exon 12122789 12122946 . - . gene_id "ENSG00000159128"; exon_number "6"; transcript_id "ENST00000381995.28.6"; mirAng1_1 bed2gtf CDS 12122789 12122946 . - 2 gene_id "ENSG00000159128"; exon_number "6"; transcript_id "ENST00000381995.28.6"; mirAng1_1 bed2gtf exon 12119960 12120095 . - . gene_id "ENSG00000159128"; exon_number "7"; transcript_id "ENST00000381995.28.7"; mirAng1_1 bed2gtf CDS 12119960 12120095 . - 0 gene_id "ENSG00000159128"; exon_number "7"; transcript_id "ENST00000381995.28.7"; mirAng1_1 bed2gtf transcript 12119960 12159135 . - . gene_id "ENSG00000159128"; mirAng1_1 bed2gtf exon 12159051 12159135 . - . gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.28.1"; mirAng1_1 bed2gtf CDS 12159051 12159135 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.28.1"; mirAng1_1 bed2gtf start_codon 12159133 12159135 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.28.1"; mirAng1_1 bed2gtf exon 12137432 12137564 . - . gene_id "ENSG00000159128"; exon_number "2"; transcript_id "ENST00000290219.28.2"; mirAng1_1 bed2gtf CDS 12137432 12137564 . - 2 gene_id "ENSG00000159128"; exon_number "2"; transcript_id "ENST00000290219.28.2"; mirAng1_1 bed2gtf exon 12131084 12131286 . - . gene_id "ENSG00000159128"; exon_number "3"; transcript_id "ENST00000290219.28.3"; mirAng1_1 bed2gtf CDS 12131084 12131286 . - 1 gene_id "ENSG00000159128"; exon_number "3"; transcript_id "ENST00000290219.28.3"; mirAng1_1 bed2gtf exon 12127340 12127485 . - . gene_id "ENSG00000159128"; exon_number "4"; transcript_id "ENST00000290219.28.4"; mirAng1_1 bed2gtf CDS 12127340 12127485 . - 2 gene_id "ENSG00000159128"; exon_number "4"; transcript_id "ENST00000290219.28.4"; mirAng1_1 bed2gtf exon 12123332 12123494 . - . gene_id "ENSG00000159128"; exon_number "5"; transcript_id "ENST00000290219.28.5"; mirAng1_1 bed2gtf CDS 12123332 12123494 . - 0 gene_id "ENSG00000159128"; exon_number "5"; transcript_id "ENST00000290219.28.5"; mirAng1_1 bed2gtf exon 12122789 12122946 . - . gene_id "ENSG00000159128"; exon_number "6"; transcript_id "ENST00000290219.28.6"; mirAng1_1 bed2gtf CDS 12122789 12122946 . - 2 gene_id "ENSG00000159128"; exon_number "6"; transcript_id "ENST00000290219.28.6"; mirAng1_1 bed2gtf exon 12119960 12120095 . - . gene_id "ENSG00000159128"; exon_number "7"; transcript_id "ENST00000290219.28.7"; mirAng1_1 bed2gtf CDS 12119960 12120095 . - 0 gene_id "ENSG00000159128"; exon_number "7"; transcript_id "ENST00000290219.28.7"; mirAng1_1 bed2gtf transcript 12137432 12137564 . - . gene_id "ENSG00000159128"; mirAng1_1 bed2gtf exon 12137432 12137564 . - . gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.30683.1"; mirAng1_1 bed2gtf CDS 12137432 12137564 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.30683.1"; mirAng1_1 bed2gtf start_codon 12137562 12137564 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000290219.30683.1"; mirAng1_1 bed2gtf transcript 12137432 12137564 . - . gene_id "ENSG00000159128"; mirAng1_1 bed2gtf exon 12137432 12137564 . - . gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.30683.1"; mirAng1_1 bed2gtf CDS 12137432 12137564 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.30683.1"; mirAng1_1 bed2gtf start_codon 12137562 12137564 . - 0 gene_id "ENSG00000159128"; exon_number "1"; transcript_id "ENST00000381995.30683.1";
As you can see the gene feature coordinates goes from 9847-12159135, have you seen this error before?
Also, I noticed that not all the transcripts have a gene feature, could you please explain to me how the gene feature is assigned? I wonder if this is an issue with the pipeline or my evidence files.
I followed your suggestion to create the isoform file in TOGA issue #91.