ANHIG / IPDKIR

Github for files currently published in the IPD-KIR FTP Directory hosted at the European Bioinformatics Institute
http://www.ebi.ac.uk/ipd/kir/
Other
11 stars 4 forks source link

Feature annotation not matching sequnce length #24

Closed sklas closed 5 years ago

sklas commented 5 years ago

Hi,

there are three alleles in the 2.8.0 release where the annotation of features does not match the length of the sequence. This happens in KIR2DL3*017 (exon 9 ends at 1050, sequence length is 1026), KIR2DL3*01801 (exon 9 ends at 981, sequence length is 957) and KIR2DL3*01802 (exon 9 ends at 1050, sequence length is 1026):

ID   KIR00655; SV 2; standard; DNA; HUM; 1026 BP.
XX
AC   KIR00655;
XX
SV   KIR00655.2
XX
DT   20-APR-2010 (Rel. 2.2.0, Created, Version 1)
DT   16-AUG-2010 (Rel. 2.3.0, Sequence Updated, Version 2)
DT   30-NOV-2018 (Rel. 2.8.0, Current Release, Version 2)
XX
DE   KIR2DL3*017, Human Killer-cell Immunoglobulin-like Receptor (partial)
XX
KW   Human Killer-cell Immunoglobulin-like Receptor; KIR2DL3*017;
XX
OS   Homo Sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates;
OC   Catarrhini; Hominidae; Homo.
XX
CC   --------------------------------------------------------------------------
CC   IPD-KIR Release Version 2.8.0
CC   --------------------------------------------------------------------------
CC   Copyrighted by the IPD Database, Distributed under the Creative Commons
CC   Attribution-NoDerivs License, see;
CC   http://www.ebi.ac.uk/ipd/licence.html for further details.
CC   --------------------------------------------------------------------------
XX
RN   [1]
RP   1-1026
RX   PUBMED; 20875478.
RA   Hardie RA, Czarnecki C, Blake Ball T, Plummer FA, Luo M;
RT   "Identification of four novel KIR2DL2 alleles and two novel KIR2DL3
RT   alleles in an East African population.";
RL   Human Immunology 71:1251-1254(2010).
XX
CC   --------------------------------------------------------------------------
CC   The sequence below is the official allele sequence as approved by the
CC   KIR Nomenclature Committee.
CC   Any cross references may differ from the sequence shown below.
CC   --------------------------------------------------------------------------
XX
DR   EMBL; FJ188692; FJ188692.0.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1026
FT                   /organism="Homo sapiens"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:9606"
FT                   /ethnic="Unknown"
FT                   /cell_line="ML1782"
FT   CDS             join(<1..34,35..70,71..370,371..664,665..715,716..820,
FT                   821..873,874..1050)
FT                   /codon_start=1
FT                   /partial
FT                   /gene="KIR2DL3"
FT                   /allele="KIR2DL3*017"
FT                   /product="KIR2DL3 Killer-cell Immunoglobulin-like Receptor"
FT                   /translation="MSLMVVSMACVGFFLLQGAWPHEGVHRKPSLLAHPGPLVKSEETV
FT                   ILQCWSDVRFEHFLLHREGKFKDTLRLIGEHHDGVSKANFSIGPMMQDLAGTYRCYGSV
FT                   THSPYQLSAPSDPLDIVITGLYEKPSLSAQPGPTVLAGESVTLSCSSRSSYDMYHLSRE
FT                   GEAHERRFSAGPKVNGTFQADFPLGPATHGGTYRCFGSFRDSPYEWSNSSDPLLVSVTG
FT                   NPSNSWPSPTEPSSETGNPRHLHVLIGTSVVIILFILLLFFLLHRWCCNKKNAVVMDQE
FT                   PAGNRTVNREDSDEQDPQEVTYAQLNHCVFTQRKITRPSQRPKTPPTDIIVYTELPNAE
FT                   PRSKVVSCP"
FT   exon            1..34
FT                   /number="1"
FT   exon            35..70
FT                   /number="2"
FT   exon            71..370
FT                   /number="4"
FT   exon            371..664
FT                   /number="5"
FT   exon            665..715
FT                   /number="6"
FT   exon            716..820
FT                   /number="7"
FT   exon            821..873
FT                   /number="8"
FT   exon            874..1050
FT                   /number="9"
SQ   Sequence 1026 BP; 242 A; 306 C; 245 G; 233 T; 0 other;
     atgtcgctca tggtcgtcag catggcgtgt gttgggttct tcttgctgca gggggcctgg        60
     ccacatgagg gagtccacag aaaaccttcc ctcctggccc acccaggtcc cctggtgaaa       120
     tcagaagaga cagtcatcct gcaatgttgg tcagatgtca ggtttgagca cttccttctg       180
     cacagagaag ggaagtttaa ggacactttg cgcctcattg gagagcacca tgatggggtc       240
     tccaaggcca acttctccat cggtcccatg atgcaagacc ttgcagggac ctacagatgc       300
     tacggttctg ttactcactc cccctatcag ttgtcagctc ccagtgaccc tctggacatc       360
     gtcatcacag gtctatatga gaaaccttct ctctcagccc agccgggccc cacggttctg       420
     gcaggagaga gcgtgacctt gtcctgcagc tcccggagct cctatgacat gtaccatcta       480
     tccagggagg gggaggccca tgaacgtagg ttctctgcag ggcccaaggt caacggaaca       540
     ttccaggccg actttcctct gggccctgcc acccacggag ggacctacag atgcttcggc       600
     tctttccgtg actctccata cgagtggtca aactcgagtg acccactgct tgtttctgtc       660
     acaggaaacc cttcaaatag ttggccttca cccactgaac caagctccga aaccggtaac       720
     cccagacacc tgcatgttct gattgggacc tcagtggtca tcatcctctt catcctcctc       780
     ctcttctttc tccttcatcg ctggtgctgc aacaaaaaaa atgctgttgt aatggaccaa       840
     gagcctgcag ggaacagaac agtgaacagg gaggactctg atgaacaaga ccctcaggag       900
     gtgacatatg cacagttgaa tcactgcgtt ttcacacaga gaaaaatcac tcgcccttct       960
     cagaggccca agacaccccc aacagatatc atcgtgtaca cggaacttcc aaatgctgag      1020
     cccaga                                                                 1026
//

Thanks, Steffen

sklas commented 5 years ago

This occurs also in 3DL1, i.e. in KIR3DL1*05902, KIR3DL1*060, KIR3DL1*061 and KIR3DL1*098

jrob119 commented 5 years ago

These are caused by extended CDS sequences with alternative stop codons. We are currently looking into a patch for the kir.dat file.

jrob119 commented 5 years ago

An update has been applied for these alleles.