NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
43 stars 18 forks source link

[New pipeline] EvidenceAlignment #35

Open Juke34 opened 4 years ago

Juke34 commented 4 years ago

See #17 for the general picture.

The purpose of this pipeline is to generate gff alignment from protein or transcript fasta files. Those gff must be formatted in match match/part (see AGAT script agat_sp_alignment_output_style.pl for that purpose if tools producing the gff output do not do it by default)

2 type of inputs: Protein fasta file and/or nucleotide fasta file. For both type of alignment we could offer an option to select which tool to use (indeed many tools exist this task). so would be nice to allow several choices (e.g for protein splice aware alignment, genomethreader, exonerated gmap, etc...).

For protein alignment: diamond or blastx for raw alignment and exonerate or scipio or spawn or genome threader for polished (splice aware) alignment =>priority to implement diamond, blastx and exonerate

For transcript alignment: => Minimap2 => we should also implement the MAKER method in two steps: 1) raw alignment with tblastx for related species data, or blastn for species-specific data; 2) exonerate for polished alignment.