Open xo2003 opened 3 months ago
Besides the issue of duplicated transcript ID, the gene model predict by GALBA is weird... When trying to fix gxf by AGAT, I got warning message as following
Warning: g13506.t1 stop codon not adjacent to the CDS
Warning: g15748.t1 stop codon not adjacent to the CDS
Warning: g1760.t2 stop codon not adjacent to the CDS
Warning: g200.t1 has several stop_codon
Warning: g201.t1 has several stop_codon
Warning: g203.t1 has several stop_codon
Warning: g206.t1 has several stop_codon
Warning: g207.t1 has several stop_codon
Warning: g2165.t1 stop codon not adjacent to the CDS
Warning: g2546.t1 stop codon not adjacent to the CDS
Warning: g2616.t1 stop codon not adjacent to the CDS
Warning: g2616.t2 stop codon not adjacent to the CDS
Warning: g2616.t3 stop codon not adjacent to the CDS
Warning: g425.t3 stop codon not adjacent to the CDS
Warning: g5487.t2 stop codon not adjacent to the CDS
Warning: g5873.t1 stop codon not adjacent to the CDS
Warning: g7418.t1 stop codon not adjacent to the CDS
14706 CDS extended to include the stop_codon
By checking the situation of the 'stop codon not adjacent to the CDS', it seems to have the same symptom in those cases.
Here is one example from the list. The stop codon predicted by the GALBA gene model is not in the same reading frame as the CDS. I am not sure how to describe it, but it seems like there is a conflict between the predicted gene model and the predicted CDS. As a result, the stop codon is not adjacent to the CDS.
Since it is complicated to fix the problem and it might be a bug during prediction, I decided not to merge the annotations of BRAKER3 and GALBA. The BRAKER3 prediction seems more reliable. Is there any suggestion about this? Thank you!
It is caused by Pyugustus. I currently have no time to fix it (neither in Pygustus, nor in Galba), but I will look into it, eventually. Most likely in fall.
On Thu, Jun 27, 2024 at 12:06 PM xo2003 @.***> wrote:
Besides the issue of duplicated transcript ID, the gene model predict by GALBA is weird... When trying to fix gxf by AGAT, I got warning message as following
Warning: g13506.t1 stop codon not adjacent to the CDS Warning: g15748.t1 stop codon not adjacent to the CDS Warning: g1760.t2 stop codon not adjacent to the CDS Warning: g200.t1 has several stop_codon Warning: g201.t1 has several stop_codon Warning: g203.t1 has several stop_codon Warning: g206.t1 has several stop_codon Warning: g207.t1 has several stop_codon Warning: g2165.t1 stop codon not adjacent to the CDS Warning: g2546.t1 stop codon not adjacent to the CDS Warning: g2616.t1 stop codon not adjacent to the CDS Warning: g2616.t2 stop codon not adjacent to the CDS Warning: g2616.t3 stop codon not adjacent to the CDS Warning: g425.t3 stop codon not adjacent to the CDS Warning: g5487.t2 stop codon not adjacent to the CDS Warning: g5873.t1 stop codon not adjacent to the CDS Warning: g7418.t1 stop codon not adjacent to the CDS 14706 CDS extended to include the stop_codon
By checking the situation of the 'stop codon not adjacent to the CDS', it seems to have the same symptom in those cases.
Here is one example from the list. image.png (view on web) https://github.com/Gaius-Augustus/GALBA/assets/136870182/6d5655d5-9d1f-454e-9d1c-cd544f77cf47 The stop codon predicted by the GALBA gene model is not in the same reading frame as the CDS. I am not sure how to describe it, but it seems like there is a conflict between the predicted gene model and the predicted CDS. As a result, the stop codon is not adjacent to the CDS.
Since it is complicated to fix the problem and it might be a bug during prediction, I decided not to merge the annotations of BRAKER3 and GALBA. The BRAKER3 prediction seems more reliable. Is there any suggestion about this? Thank you!
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/51#issuecomment-2194293877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JDXAOCVWC5LUVXWYUTZJPP37AVCNFSM6AAAAABJJ2NWH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGI4TGOBXG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
I am planning to merge GALBA result together with BRARKER3 by TSEBRA. However, while running the standalone version of GALBA v1.0.11, I encountered an issue with five duplicated transcript IDs in the same pair of scaffolds (scaffold26:18.6Mbp and scaffold30:17.5Mbp).
When blasting these two scaffolds, 254 hits were found. The longest hit fragment is about 6 Kbp with 99.4% identity; however, this region does not cover the positions of the duplicated transcript IDs. Other hit fragments are less than 1 Kbp.
Since the duplication will cause an error during the execution of TSEBRA, I am seeking advice on how to resolve this issue.
Thank you!