[New pipeline] AnnotationToENA

See #17 for the general picture.

Here a description of the AnnotationToENA pipeline we need:

Input file: 2 => The GFF file along with the Fasta file Tool needed: AGAT, EMBLmyGFF3 both available by Bioconda and webin-cli-.jar from https://github.com/enasequence/webin-cli (they provide a docker. We can create a bioconda recipe) Output file 1 => EMBL Flat file Required parameters: (all for EMBLmyGFF3):

LOCUS_TAG (default "XXX")
PROJECT (default "XXX")
MOLECULE (default "genomic DNA")
TABLE (default 1)
TOPOLOGIE (default linear)
SPECIES (latin name (e.g. "Drosophila melanogaster") or taxid, no default value)

Step1: agat_sp_flag_short_intron.pl --gff annotation.gff -o annotation_short_intron_flagged.gff Step2: agat_sp_fix_features_locations_duplicated.pl --gff annotation_short_intron_flagged.gff -o annotation_short_intron_flagged_duplicated_location_fixed.gff Step3:

EMBLmyGFF3 --expose_translations # to get the son files locally
code find a way to add "remove": true after the line "exon": { in the local translation_gff_feature_to_embl_feature.json file
example: EMBLmyGFF3 -I $LOCUS_TAG -p $PROJECT -m $MOLECULE -r $TABLE -t $TOPOLOGIE -s SPECIES -o annotation.embl annotation.gff genome.fa

Step4: validation using the Webin-CLI command line submission program that supports validation using the -validate option: see here https://github.com/enasequence/webin-cli

NBISweden / pipelines-nextflow

[New pipeline] AnnotationToENA #30