eppic-team / eppic

:white_check_mark::x:Evolutionary protein-protein interface classifier
http://eppic-web.org
Other
8 stars 3 forks source link

Using SIFTS for the alignment of SEQRES to UniProt sequence #188

Closed lafita closed 5 years ago

lafita commented 7 years ago

I have been using SIFTS recently and I have realised that we could use the map of SEQRES and UniProt sequence provided as the alignment, rather than computing it ourselves.

This is particularly important to handle the special cases, like artificially designed proteins. One example is 4DOU, a fusion of three chains. Our alignment is not correct, since we correctly map one of the three chains to the UniProt sequence, but the other two are incorrect. If we used the SIFTS alignment (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/4dou.xml.gz), all the residues of the SEQRES could have been matched to UniProt residues, and the evolutionary score could be more reliably computed.

josemduarte commented 7 years ago

We do use SIFTS to get the mapped uniprot id and region but then we do our own alignment.

I can see the alignment is not so good for that 4DOU case. That's in part an issue introduced in 3 related to some issues in biojava.

In principle I agree that we could just use the SIFTS alignment as it is given. The problem is that for user input (non-deposited files) we still have to align ourselves. So SIFTS only solves part of the problem.

lafita commented 7 years ago

Ok I see. Well this happens in a very minor number of cases, and engineered proteins are not that interesting for EPPIC. I just submitted the issue because I thought about it.