GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Index Error [-1] on GFFs with Shine_Dalgarno features of length 3 #2607

Closed BeaverThing closed 2 years ago

BeaverThing commented 3 years ago

Our team has encountered an error with a python script we wrote for sending GFF files directly to the annotation track of an Apollo organism. The GFF consists of gene features with an mRNA as a subfeature, and CDS and Shine_Dalgarno_sequence type features as subfeatures of the mRNA. The script initializes a webapollo object wa via get_apollo_instance(), then passes the GFF file off via wa.annotations.load_gff3([arguments]).

When the GFF file contains a Shine_Dalgarno_sequence feature of length 3, the command will fail with the message "ERROR:root:Error returned by Apollo while loading data: String index out of range: -1". If we filter the data to only allow shines of length 4 or greater, the tracks will successfully populate into the annotations of the given organism.

Our server is running Apollo 2.6.2-SNAPSHOT, and the script is loading python-apollo 4.2.10 via the conda package.

nathandunn commented 3 years ago

note: sample GFF3. The issue is the SD of length 3:

##gff-version 3
##sequence-region AY216660.2 1 48836
AY216660.2  GbkToGff    gene    40  576 .   +   .   locus_tag=CPT-T1_001;ID=CPT-T1_001.gene;
AY216660.2  GbkToGff    mRNA    40  576 .   +   .   locus_tag=CPT-T1_001;Notes=mRNA feature automatically generated by Gbk to GFF conversion;ID=CPT-T1_001.mRNA;Parent=CPT-T1_001.gene;
AY216660.2  GbkToGff    Shine_Dalgarno_seqeunce 40  43  .   +   .   locus_tag=CPT-T1_001;regulatory_class=ribosome_binding_site;ID=CPT-T1_001.Shine_Dalgarno_seqeunce.1;Parent=CPT-T1_001.mRNA;
AY216660.2  GbkToGff    CDS 52  576 .   +   0   locus_tag=CPT-T1_001;codon_start=1;transl_table=11;product=terminase small subunit;translation=MSEPKNAPVVQGGNFKELYKKKFGTVLAKNRAMTPEQLFDLSVKYFEWAEDNAIKASESASFQGGVYESLVHKPRVFTWTGYRLFIGASEAAIIKWKREEEYSEVMEFVESVINEQKFQLAANGVINASFIGKDLGIDKPASINIENSSASASTVVATTEDAMKEAVNSILDML;note=Orf no. 54 see PMID: 14972552;ID=CPT-T1_001.CDS.1;Parent=CPT-T1_001.mRNA;
AY216660.2  GbkToGff    gene    589 2184    .   +   .   locus_tag=CPT-T1_002;ID=CPT-T1_002.gene;
AY216660.2  GbkToGff    mRNA    589 2184    .   +   .   locus_tag=CPT-T1_002;Notes=mRNA feature automatically generated by Gbk to GFF conversion;ID=CPT-T1_002.mRNA;Parent=CPT-T1_002.gene;
AY216660.2  GbkToGff    Shine_Dalgarno_seqeunce 589 591 .   +   .   locus_tag=CPT-T1_002;regulatory_class=ribosome_binding_site;ID=CPT-T1_002.Shine_Dalgarno_seqeunce.1;Parent=CPT-T1_002.mRNA;
AY216660.2  GbkToGff    CDS 601 2184    .   +   0   locus_tag=CPT-T1_002;codon_start=1;transl_table=11;product=terminase large subunit;translation=MGDLIMIQWEDLNATQKLAIKKMSEANFEKMIRIWFQLMQAQQFQPNWHHLYLCHEVEEIIAGRRGNTIFNVTPGSGKTEVFSIHLPVYAMLKCKKVRNLNVSFADSLVKRNSKRVREIISSNEFQELWPCKFGTSKDEEMQVLNEDGKVWFELISAAAGGRITGSRGGYMTPGFSGMVMLDDIDKPDDMFSKVKRERTHMLLKNTIRSRRMHNETPIIAIQQRLHAQDSTWFMMNGGMGIEFDQISIPALVTEEYGKTLPDWLQPYFERDVLSSEYVELDGVKHYSFWPSKESVHDLLALREADQYTFDSQYQQKPIALGGSVFNSEWWTYYGSSLDADEPDPGKYDYRFITADTAQKTGELNDYTVFCLWGKKNDKVYFIDGIRGKWEAPDMERQFTAFVNQAWRHNKSMGVLRKIYVEDKASGTGLIQNLRKKTPISITPLQRNKDKVTRAMDAQPVIKAGRVVLPEEHPMLAEIIAEHSAFTYDDTHPHDDIVDNFMDAANIELLTIDDPIERMKRLAGMVKR;note=Orf no. 53 see PMID: 14972552;ID=CPT-T1_002.CDS.1;Parent=CPT-T1_002.mRNA;
AY216660.2  GbkToGff    gene    2230    3522    .   +   .   locus_tag=CPT-T1_003;ID=CPT-T1_003.gene;
AY216660.2  GbkToGff    mRNA    2230    23794   .   +   .   locus_tag=CPT-T1_003;Notes=mRNA feature automatically generated by Gbk to GFF conversion;ID=CPT-T1_003.mRNA;Parent=CPT-T1_003.gene;
AY216660.2  GbkToGff    Shine_Dalgarno_seqeunce 2230    2233    .   +   .   locus_tag=CPT-T1_003;regulatory_class=ribosome_binding_site;ID=CPT-T1_003.Shine_Dalgarno_seqeunce.2;Parent=CPT-T1_003.mRNA;
AY216660.2  GbkToGff    CDS 2239    3522    .   +   0   locus_tag=CPT-T1_003;note=HHPred predicted structural similarity at 99%25 probability to phage T4 portal protein gp20 Protein Data Bank entry 3JA7 over most of protein%3B Orf no. 52 see PMID: 14972552;codon_start=1;transl_table=11;product=portal protein;translation=MKIVKHDGYNDIFNGGADGSPKPFFMSDASYHVGSFYNDNATAKRIVDVIPEEMVTAGFKMSGVKDEKEFKSLWDSYKLDSSLVDLLCWARLYGGAAMVAIIKDNRMLTSQAKPGAKLEGVRVYDRFAITVEKRVTNARSPRYGEPEIYKVSPGDNMQPYLIHHSRVFIADGERVAQQARKQNQGWGASVLNKSLIDAICDYDYCESLATQILRRKQQAVWKVKGLAEMCDDDDAQYAARLRLAQVDDNSGVGRAIGIDAETEEYDVLNSDISGVPEFLSSKMDRIVSLSGIHEIIIKNKNVGGVSASQNTALETFYKLVDRKREEDYRPLLEFLLPFIVDEEEWSIEFEPLSVPSKKEESEITKNNVESVTKAITEQIIDLEEARDTLRSIAPEFKLKDGNNINIREPEETTEPEPGLGEKLEDEN;ID=CPT-T1_003.CDS.1;Parent=CPT-T1_003.mRNA;
nathandunn commented 3 years ago

@BeaverThing this might be handled as part of #2515 , which might be prior to the version you are running.

Can you move this to the develop branch (or at least 2.6.3) and see if it fixes it?

BeaverThing commented 2 years ago

Shine Dalgarno features of length 3 are now able to be uploaded after testing with the latest version of Apollo installed