NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
447 stars 55 forks source link

Creating a Hints file for use in Augustus from a gff file #177

Open sanyalab opened 3 years ago

sanyalab commented 3 years ago

Hi Jacques,

The hintsfile in Augustus (https://github.com/Gaius-Augustus/Augustus/blob/master/docs/RUNNING-AUGUSTUS.md#using-hints) is a useful method of predicting genes. However, scripts are there to convert BAM (http://manpages.ubuntu.com/manpages/bionic/man1/bam2hints.1.html) and GTF files to the hintsfile (https://github.com/Gaius-Augustus/Augustus/tree/master/scripts). There is no script to process GFF files to get a single hinstfile with all the 16 hints for direct use.

Can this be taken up as an enhancement objective for AGAT. I am typicaly thinking of a GFF file with features (exon, intron, CDS, five_prime_utr, three_prime_utr) that can provide info for (translation start, translation stop, acceptor splice site, donor splice site, exact exon, part of exon, exact intron in CDS/UTR, part of an intron, CDS, CDSpart, UTR, UTRpart)

Thanks Abhijit

Juke34 commented 3 years ago

Sounds a good task for AGAT. We will think about it.

sanyalab commented 3 years ago

Thanks Jacques. The problem with the GTF2hints scripts is that there are several of them, and one has to tie them together using another join script in the Augustus scripts directory. A one stop shop gff2hints (something like that) would be more suitable.

Thanks for looking into this. Abhijit

sanyalab commented 1 year ago

Hi Jacques,

I was wondering if this request can still be worked upon.

Regards Abhijit

Juke34 commented 1 year ago

I would like to develop this but do not have time. Maybe if you could already work on listing all types of hints used by Augustus and what they represent and how they can be generated, e.g. intron corresponds to regions in a gene between exons... it would help.

photocyte commented 6 months ago

Hi there, I'd also be interested in this. I know very little about perl so while I'd typically be interested to help out with contributing, don't think I'd do it justice. Per your last question @Juke34 , here are all the 16 hint types like @sanyalab mentioned, from the Augustus documentation that was linked above. Direct deeplink below:

https://github.com/Gaius-Augustus/Augustus/blob/master/docs/RUNNING-AUGUSTUS.md#using-hints:~:text=Setting%20the%20bonus%20to%201.0%20disables%20the%20boni.

Juke34 commented 6 months ago

I had work on that a while ago, I forgot to push this work in progress. You can find it in the Augustus branch. The script is called agat_sp_create_augustus_hints.pl.

Start and stop should be by default in the file if not you should use the agat_sp_add_start_and_stop.pl script. Then I should add a function to replace the start_codon and stop_codon feature type by start and stop accordingly.
For UTRs I should add a function to detect synonym and call all of them UTR.
For irpart for now they are called intergenic_region by the script. It should be renamed.
For nonexonpart, it should be a copy of irpart and intron part.
Intronpart should be a copy of intron feature?
Exonpart should be a copy of the exon feature? Some other stuff to decipher... UTRpart, CDS part ...