Closed lafita closed 5 years ago
We do use SIFTS to get the mapped uniprot id and region but then we do our own alignment.
I can see the alignment is not so good for that 4DOU case. That's in part an issue introduced in 3 related to some issues in biojava.
In principle I agree that we could just use the SIFTS alignment as it is given. The problem is that for user input (non-deposited files) we still have to align ourselves. So SIFTS only solves part of the problem.
Ok I see. Well this happens in a very minor number of cases, and engineered proteins are not that interesting for EPPIC. I just submitted the issue because I thought about it.
I have been using SIFTS recently and I have realised that we could use the map of SEQRES and UniProt sequence provided as the alignment, rather than computing it ourselves.
This is particularly important to handle the special cases, like artificially designed proteins. One example is 4DOU, a fusion of three chains. Our alignment is not correct, since we correctly map one of the three chains to the UniProt sequence, but the other two are incorrect. If we used the SIFTS alignment (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/4dou.xml.gz), all the residues of the SEQRES could have been matched to UniProt residues, and the evolutionary score could be more reliably computed.