ANHIG / IMGTHLA

Github for files currently published in the IPD-IMGT/HLA FTP Directory hosted at the European Bioinformatics Institute
http://www.ebi.ac.uk/ipd/imgt/hla/
Other
204 stars 60 forks source link

A*01:01:02 #327

Closed AlsoATraveler closed 1 year ago

AlsoATraveler commented 1 year ago

Hello, in the 3.40 version, there is a question about A*01:01:02. I found that the end of the sequence corresponding to A*01:01:02 in the A_nuc.fasta file (that is, the CDS sequence) is AAAGTGTGA, but in A_gen. In the fasta file, there is no corresponding sequence, but AAAGGTGAG. What is the reason?

AlsoATraveler commented 1 year ago

The same is A*01:01:02, and TGGAGAACGGGAAGGAGACGCTGCAGCGCACGGA, which is TGGAGAACGGGAAGGAGACGCTGCAGCGCACGGG in A_gen.fasta

dominicbarkerAN commented 1 year ago

Hello, I have reviewed the sequence you have suggested and found no issue with it. The sequence in 3.40.0 is the same as the sequence in the latest release, which is correct. The issue that you are having is that you are not correctly splitting the CDS sequence in the A_nuc.fasta file into exons to search for it in the A_gen.fasta, which contains exons and introns.

For example you say that the A_nuc.fasta file ends AAAGTGTGA which does not appear in the A_gen.fasta. That is because this sequence you are searching for covers two exons, the end of exon 7 and exon 8. It would not appear in the A_gen.fasta because of the intron 7 sequence between these two. The exon 7 and exon 8 sequence of A*01:01:02 is:

exon 7: GCAGTGACAGTGCCCAGGGCTCTGATGTGTCTCTCACAGCTTGTAAAG exon 8: TGTGA

The same is true for your second comment which contains sequence crossing exon 3 and 4.

AlsoATraveler commented 1 year ago

Oh, thanks.