Here a description of the AnnotationToENA pipeline we need:
Input file: 2 => The GFF file along with the Fasta file
Tool needed: AGAT, EMBLmyGFF3 both available by Bioconda and webin-cli-.jar from https://github.com/enasequence/webin-cli (they provide a docker. We can create a bioconda recipe)
Output file 1 => EMBL Flat file
Required parameters: (all for EMBLmyGFF3):
LOCUS_TAG (default "XXX")
PROJECT (default "XXX")
MOLECULE (default "genomic DNA")
TABLE (default 1)
TOPOLOGIE (default linear)
SPECIES (latin name (e.g. "Drosophila melanogaster") or taxid, no default value)
Step4: validation using the Webin-CLI command line submission program that supports validation using the -validate option: see here https://github.com/enasequence/webin-cli
See #17 for the general picture.
Here a description of the AnnotationToENA pipeline we need:
Input file: 2 => The GFF file along with the Fasta file Tool needed: AGAT, EMBLmyGFF3 both available by Bioconda and webin-cli-.jar from https://github.com/enasequence/webin-cli (they provide a docker. We can create a bioconda recipe)
Output file 1 => EMBL Flat file
Required parameters: (all for EMBLmyGFF3):
Step1:
agat_sp_flag_short_intron.pl --gff annotation.gff -o annotation_short_intron_flagged.gff
Step2:agat_sp_fix_features_locations_duplicated.pl --gff annotation_short_intron_flagged.gff -o annotation_short_intron_flagged_duplicated_location_fixed.gff
Step3:EMBLmyGFF3 --expose_translations
# to get the son files locally"remove": true
after the line"exon": {
in the local translation_gff_feature_to_embl_feature.json fileEMBLmyGFF3 -I $LOCUS_TAG -p $PROJECT -m $MOLECULE -r $TABLE -t $TOPOLOGIE -s SPECIES -o annotation.embl annotation.gff genome.fa
Step4: validation using the Webin-CLI command line submission program that supports validation using the -validate option: see here https://github.com/enasequence/webin-cli