enasequence / sequencetools

Webin sequence validation API.
Apache License 2.0
10 stars 3 forks source link

Entry features ignored / skipped #30

Open jjkoehorst opened 6 years ago

jjkoehorst commented 6 years ago

I am trying to parse the salmon genome from the NCBI and noticed that certain mRNA / CDS features are "ignored".

A slimmed down file is located here: http://fungen.wur.nl/~jasperk/sequencetools/out21.test

The gene is detected as well as the source and a gap located in the file

     gene            56716563..56758743
                     /gene="LOC106582566"
                     /note="Derived by automated computational analysis using
                     gene prediction method: Gnomon."
                     /db_xref="GeneID:106582566"
     mRNA            join(56716563..56716682,56719177..>56719254,
                     <56742712..56742807,56743880..56743972,56745317..56745442,
                     56745673..56745737,56749029..56749092,56749349..56749495,
                     56751273..56751343,56751508..56751595,56752027..56752179,
                     56754778..56754942,56755341..56755402,56757044..56757155,
                     56757360..56757459,56757978..56758083,56758205..56758743)
                     /gene="LOC106582566"
                     /product="ATP-dependent 6-phosphofructokinase, liver
                     type-like"
                     /inference="similar to RNA sequence (same
                     species):INSD:GBRB01064866.1"
                     /exception="annotated by transcript or proteomic data"
                     /note="The sequence of the model RefSeq transcript was
                     modified relative to this genomic sequence to represent
                     the inferred CDS: added 510 bases not found in genome
                     assembly; Derived by automated computational analysis
                     using gene prediction method: Gnomon. Supporting evidence
                     includes similarity to: 1 EST, 16 Proteins, and 98%
                     coverage of the annotated genomic feature by RNAseq
                     alignments"
                     /transcript_id="XM_014165779.1"
                     /db_xref="GeneID:106582566"

The mRNA however is not in the list...

0 = {SourceFeature@3038} "uk.ac.ebi.embl.api.entry.feature.SourceFeature@31e32ea2[id=<null>,name=source,locations=uk.ac.ebi.embl.api.entry.location.Join@1473b8c0[locations=[uk.ac.ebi.embl.api.entry.location.LocalRange@5b5c0057[beginPosition=1,endPosition=58021487,complement=false]]],qualifiers=[uk.ac.ebi.embl.api.entry.qualifier.OrganismQualifier@749f539e[id=<null>,name=organism,value=<null>], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@5ca1f591[id=<null>,name=isolate,value=Sally], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@551de37d[id=<null>,name=chromosome,value=ssa21], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@6ef81f31[id=<null>,name=sex,value=female], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@6075b2d3[id=<null>,name=tissue_type,value=muscle], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@33abde31[id=<null>,name=dev_stage,value=adult], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@997d532[id=<null>,name=breed,value=double haploid]],xRefs=[]]"
1 = {Feature@3039} "uk.ac.ebi.embl.api.entry.feature.Feature@273842a6[id=<null>,name=assembly_gap,locations=uk.ac.ebi.embl.api.entry.location.Join@6a969fb8[locations=[uk.ac.ebi.embl.api.entry.location.LocalRange@7a18e8d[beginPosition=18692,endPosition=18791,complement=false]]],qualifiers=[uk.ac.ebi.embl.api.entry.qualifier.Qualifier@3028e50e[id=<null>,name=estimated_length,value=unknown], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@5560bcdf[id=<null>,name=gap_type,value=between scaffolds]],xRefs=[]]"
2 = {Feature@3040} "uk.ac.ebi.embl.api.entry.feature.Feature@b558294[id=<null>,name=gene,locations=uk.ac.ebi.embl.api.entry.location.Join@bb095[locations=[uk.ac.ebi.embl.api.entry.location.LocalRange@777c350f[beginPosition=56716563,endPosition=56758743,complement=false]]],qualifiers=[uk.ac.ebi.embl.api.entry.qualifier.Qualifier@27aae97b[id=<null>,name=gene,value=LOC106582566], uk.ac.ebi.embl.api.entry.qualifier.Qualifier@4c9e38[id=<null>,name=note,value=Derived by automated computational analysis using gene prediction method: Gnomon.]],xRefs=[uk.ac.ebi.embl.api.entry.XRef@5d1e09bc[database=GeneID,primaryAccession=106582566,secondaryAccession=<null>]]]"

I am not sure if the mRNA is according to accepted standards but this is how it is available online... It does detect the mRNA when I remove most of the positions so I am guessing it is related to that...