Open xo2003 opened 1 month ago
Same here.
For decorating UTR via --addUTR=on
might be a solution for this issues.
But even if I successfully executed test7.sh
#831 with the toy data, I still encountered an error while dealing with real data.
Since setting --addUTR=on
is equivalent to running GUSHR directly #506, I added UTR using GUSHR with Java8.
gushr.py
needs to be fixed according to https://github.com/Gaius-Augustus/GUSHR/issues/5.
The exon line of gushr.gtf
can be restored by rename_gtf.py
(exon position = CDS) in TSEBRA or gtf2gff.pl
(exon position including UTR) in Augustus.
By this way, the UTR features seem more reliable but still need to be examined carefully.
Hi, Since UTR is a part of the exon feature in molecular biology, I attempted to extend the start coordinate of the first exon (for forward strand transcripts, 5’ -> 3’) or the end coordinate of the first exon (for reverse strand transcripts, 3’ -> 5’) of each transcript that has a UTR feature. After converting the GTF file to GFF3 format and running the
gff_QC
tool fromGFF3toolkit
, I discovered thatstringtie2utr.py
created internal UTR features, which caused the START position to be greater than the END position after adjusting the exon coordinates. The image below, cropped from the original GTF after adding UTRs, shows internal 5' UTR features. Internal 3' UTR features were also found. Approximately 100 transcripts are affected by this issue.Additionally, some exon from StringTie remain in GTF.
By applying
stringtie2utr.py
to two of our genomes, both of them were found to have internal UTR. The stringtie GFF for decorate UTR is from${BRAKER3_OUT}/GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff
.Here is the RNA library information for the two genomes: [Genome1] 7 RNA-seq libraries: 3 un-stranded and 4 reverse-stranded. [Genome2] 49 RNA-seq libraries: all reverse-stranded.
I would appreciate any suggestions to resolve these internal UTR issues. Thank you!