Open sanyalab opened 3 years ago
Sounds a good task for AGAT. We will think about it.
Thanks Jacques. The problem with the GTF2hints scripts is that there are several of them, and one has to tie them together using another join script in the Augustus scripts directory. A one stop shop gff2hints (something like that) would be more suitable.
Thanks for looking into this. Abhijit
Hi Jacques,
I was wondering if this request can still be worked upon.
Regards Abhijit
I would like to develop this but do not have time. Maybe if you could already work on listing all types of hints used by Augustus and what they represent and how they can be generated, e.g. intron corresponds to regions in a gene between exons... it would help.
Hi there, I'd also be interested in this. I know very little about perl so while I'd typically be interested to help out with contributing, don't think I'd do it justice. Per your last question @Juke34 , here are all the 16 hint types like @sanyalab mentioned, from the Augustus documentation that was linked above. Direct deeplink below:
I had work on that a while ago, I forgot to push this work in progress. You can find it in the Augustus
branch.
The script is called agat_sp_create_augustus_hints.pl
.
Start and stop should be by default in the file if not you should use the agat_sp_add_start_and_stop.pl
script. Then I should add a function to replace the start_codon and stop_codon feature type by start
and stop
accordingly.
For UTRs I should add a function to detect synonym and call all of them UTR.
For irpart for now they are called intergenic_region by the script. It should be renamed.
For nonexonpart, it should be a copy of irpart and intron part.
Intronpart should be a copy of intron feature?
Exonpart should be a copy of the exon feature?
Some other stuff to decipher... UTRpart, CDS part ...
Hi Jacques,
The hintsfile in Augustus (https://github.com/Gaius-Augustus/Augustus/blob/master/docs/RUNNING-AUGUSTUS.md#using-hints) is a useful method of predicting genes. However, scripts are there to convert BAM (http://manpages.ubuntu.com/manpages/bionic/man1/bam2hints.1.html) and GTF files to the hintsfile (https://github.com/Gaius-Augustus/Augustus/tree/master/scripts). There is no script to process GFF files to get a single hinstfile with all the 16 hints for direct use.
Can this be taken up as an enhancement objective for AGAT. I am typicaly thinking of a GFF file with features (exon, intron, CDS, five_prime_utr, three_prime_utr) that can provide info for (translation start, translation stop, acceptor splice site, donor splice site, exact exon, part of exon, exact intron in CDS/UTR, part of an intron, CDS, CDSpart, UTR, UTRpart)
Thanks Abhijit