IEDB / MRO

The MHC Restriction Ontology
8 stars 6 forks source link

Update HLA sequences #112

Closed beckyjackson closed 3 years ago

beckyjackson commented 3 years ago

Added new task refresh-hla-seqs that only overwrites existing HLA sequences, then ran this task to update.

beckyjackson commented 3 years ago

Sorry for the ugly diff on the script - I ran black and it looks like it changed the indentation from 2 to 4. The only relevant change is here: https://github.com/IEDB/MRO/pull/112/files#diff-7340e905b42a20af311fcabb11fc04a81ded8acccf8afe11f8db4c5649fdd828R58

jamesaoverton commented 3 years ago

@rvita Do you want to review the changes to chain-sequences.tsv or not? There's a few hundred.

rvita commented 3 years ago

I do not understand the new seqs - all new seqs seem to be much shorter than their old seqs. are the new seqs just the g domains, in which case they go into a new column, they do not replace the existing full length seq. is this coming from IMGT b/c it does not match what I see on their website. for ex, https://www.ebi.ac.uk/cgi-bin/ipd/imgt/hla/get_allele.cgi?DRB1*13:227 shows what is listed as the Old Sequence.

beckyjackson commented 3 years ago

Sorry, my mistake! The "old" and "new" columns in the file that I sent you should be switched, I did the diff backwards. So you are correct that https://www.ebi.ac.uk/cgi-bin/ipd/imgt/hla/get_allele.cgi?DRB1*13:227 should show the "old" sequence because that is actually the "new" sequence. Sorry again, I should have caught that before I sent it to you.

rvita commented 3 years ago

well, in that case, it looks good. I think you were playing a trick on me.