enasequence / sequencetools

Webin sequence validation API.
Apache License 2.0
10 stars 3 forks source link

CDS entries dealing with AH lines #48

Open nbuso opened 6 years ago

nbuso commented 6 years ago

Coding entries contains AH lines also in Standard data class. Errors are of the type:

Unknown line type: AH LOCAL_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP

Example entry from file: std/cum_std_inv_01_r137.cds

ID   DAB41761; SV 1; linear; genomic DNA; STD; INV; 1533 BP.
XX
PA   BK010358.1
XX
DT   16-JUL-2018 (Rel. 137, Created)
DT   16-JUL-2018 (Rel. 137, Last updated, Version 1)
XX
DE   Heliconius melpomene (postman butterfly) cytochrome P450 CYP405A4
XX
KW   TPA; TPA:inferential.
XX
OS   Heliconius melpomene (postman butterfly)
OC   Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota;
OC   Neoptera; Holometabola; Lepidoptera; Glossata; Ditrysia; Papilionoidea;
OC   Nymphalidae; Heliconiinae; Heliconiini; Heliconius.
XX
RN   [1]
RC   Publication Status: Online-Only
RP   1-5039
RX   DOI; 10.1186/1471-2164-10-574.
RX   PUBMED; 19954531.
RA   Zagrobelny M., Scheibye-Alsing K., Jensen N.B., Moller B.L., Gorodkin J.,
RA   Bak S.;
RT   "454 pyrosequencing based transcriptome analysis of Zygaena filipendulae
RT   with focus on genes involved in biosynthesis of cyanogenic glucosides";
RL   BMC Genomics 10:574-574(2009).
XX
RN   [2]
RC   Publication Status: Available-Online prior to print
RP   1-5039
RX   DOI; .1007/s00239-018-9854-8.
RX   PUBMED; 29974176.
RA   Zagrobelny M., Jensen M.K., Vogel H., Feyereisen R., Bak S.;
RT   "Evolution of the Biosynthetic Pathway for Cyanogenic Glucosides in
RT   Lepidoptera";
RL   J. Mol. Evol. 86(6):379-394(2018).
XX
RN   [3]
RP   1-5039
RA   Zagrobelny M., Jensen M.K., Vogel H., Feyereisen R., Bak S.;
RT   ;
RL   Submitted (17-NOV-2017) to the INSDC.
RL   Department of Plant and Environmental Sciences, University of Copenhagen,
RL   Thorvaldsensvej 40, Frederiksberg C, Copenhagen 1871, Denmark
XX
DR   MD5; 0fdc51a4288be7834c8c4de6bbb12cab.
XX
AH   LOCAL_SPAN          PRIMARY_IDENTIFIER PRIMARY_SPAN        COMP
AS   1-160               CAEZ01010737.1     92003-92162         c
AS   161-270             CAEZ01010737.1     92669-92778         c
AS   271-364             CAEZ01010737.1     93023-93116         c
AS   365-550             CAEZ01010737.1     94792-94977         c
AS   551-620             CAEZ01010737.1     95076-95145         c
AS   621-747             CAEZ01010737.1     95230-95356         c
AS   748-957             CAEZ01010737.1     95438-95647         c
AS   958-1201            CAEZ01010737.1     95870-96113         c
AS   1202-1372           CAEZ01010737.1     96335-96505         c
AS   1373-1533           CAEZ01010737.1     96881-97041         c
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1533
FT                   /organism="Heliconius melpomene"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:34740"
FT   CDS             join(BK010358.1:1..160,BK010358.1:667..776,
FT                   BK010358.1:1021..1114,BK010358.1:2790..2975,
FT                   BK010358.1:3074..3143,BK010358.1:3228..3354,
FT                   BK010358.1:3436..3645,BK010358.1:3868..4111,
FT                   BK010358.1:4333..4503,BK010358.1:4879..5039)
FT                   /codon_start=1
FT                   /gene="CYP405A4"
FT                   /product="cytochrome P450 CYP405A4"
FT                   /protein_id="DAB41761.1"
FT                   /translation="MFFLLLVLLVISLILIQDFVKKRGYKWKKLSEFPGDRPLPLVGNG
FT                   LDIGFDADEASFRLIKMWEKHGKQNFRLSVGSEDWLMLCEPEDIKVLLNDNFELSKPLE
FT                   RNAAMKPFFGYSVSTSEGERWKSVRKLMTPSFHFKALDRAADDFGKHTQTLFELIDSYE
FT                   GKSPVNIYTYLKPYMLDMLSVNLFGVEKNYLKNRDHPYIKSSSKMIKIITYNYFSYWRN
FT                   NSYLVRFTPLHTEMQNIIRDVKQNSSDIITQRRKILNKMIEETKSNNKNIDLNYDTFIE
FT                   DKLAGGCLLDRFILSVSPNGDPEKDAIINEEITLTLFTGHLTTSMTMCNAMYLMSLYPE
FT                   VQQKVLDEQKAIFGKDLNRQATTQDLNDMKYLEALIKETIRFIPTIPRIGRQLQKDLKL
FT                   SDGRVAPAGTSVIVFFNAAARNPRTYTEPEKFMPERFFDTTMHPFAFVPFSAGPRNCVA
FT                   FRYAWIVLKATLSNIIRRYEILPGPEPKFAFRLITESTNGLHLHYKKRDICA"
XX
SQ   Sequence 1533 BP; 510 A; 288 C; 302 G; 433 T; 0 other;
     atgtttttct tactgttagt tctgctcgtt atatctttaa tattgatcca agattttgtc        60
     aagaaacgtg ggtacaaatg gaagaagctt tcagaatttc ccggtgacag accgttacca       120
     ttggtcggga atggcctaga catcggattc gatgcagacg aagcgtcttt tagattgatt       180
     aaaatgtggg aaaaacatgg aaaacaaaat ttccgtctgt ctgttggatc cgaggactgg       240
     ctcatgctgt gcgaacctga agacattaag gtacttctaa acgacaattt tgagctttcg       300
     aaacctctag agagaaatgc agccatgaaa ccattctttg gctactccgt atccacttct       360
     gagggagaaa gatggaaatc agtgagaaag ctgatgactc caagttttca ctttaaagct       420
     ttggaccgag ctgctgacga tttcggcaaa cacactcaaa cactctttga gctcatagat       480
     tcttatgaag gcaagagtcc agtcaatata tacacatatt tgaagccgta catgctcgat       540
     atgttaagcg ttaatctttt tggagttgaa aagaattacc ttaaaaaccg tgatcatcca       600
     tatattaagt ccagcagcaa aatgattaag attataacat acaattactt ttcctactgg       660
     agaaataata gttacttagt aagatttact ccattgcaca ctgaaatgca gaatattatt       720
     agagacgtca aacaaaacag ttcagatata attacgcaaa gaagaaaaat tttgaacaaa       780
     atgatagaag aaaccaagag caacaataag aacatcgact tgaactatga tactttcata       840
     gaagataagc ttgctggtgg ttgcttgttg gacagattta tattaagcgt gtcgccaaat       900
     ggtgatcctg agaaggatgc gatcattaat gaagaaataa cattaacctt atttactgga       960
     catctaacga catcaatgac gatgtgcaac gcaatgtacc tgatgtcatt gtacccagaa      1020
     gtacaacaga aagttctaga tgaacagaag gctatatttg gtaaagactt gaacagacaa      1080
     gcgactacac aagatttaaa tgacatgaaa tatcttgaag cgctaataaa ggagactata      1140
     aggttcattc ctacaattcc cagaattggg agacaactac aaaaagactt aaagttatct      1200
     gatggtcgtg tagctcctgc tggaacgtct gtaattgttt tcttcaatgc cgctgccaga      1260
     aatcccagga cttacacaga accagaaaaa tttatgcctg aacgattctt tgacacaaca      1320
     atgcatccat ttgcttttgt tcctttcagt gccggaccta ggaactgcgt agctttccgc      1380
     tatgcgtgga ttgttctgaa ggcgacttta tcgaacatta tacggagata tgaaattctt      1440
     cctggtcctg aacccaaatt tgccttccgt ctgatcacgg aatctaccaa tggattgcat      1500
     ctacactaca agaagagaga tatatgtgct taa                                   1533
//