Closed ChrisMoth closed 2 years ago
Selenomethionine, code MSE, letter U, is outside the core IUPAC 20 amino acids in Biopython.
However, we sometimes see U both in uniparc transcript sequence and in ENSEMBL transcripts Here is an example:
$ $ transcript_to_AAseq.pl ENST00000611653 | grep U MCASRDDWRCARSMHEFSAKDIDGHMVNLDKYRGFVCIVTNVASQUGKTEVNYTQLVDLHARYAECGLRILAFPCNQFGKQEPGSNEEIKEFAAGYNVKFDMFSKICVNGDDAHPLWKWMKIQPKGKGILGNAIKWNFTKFLIDKNGCVVKRYGPMEEPLVIEKDLPHYF
The fix is to allow U in the transcript sequences, and match to MSE on the structural side.
Additionally, the pipeline must do a better job of naming H_ hetero atoms generally. This is accomplished by looking more deeply into the mmcif dictionary than before to mon_nstd flag.
Selenomethionine, code MSE, letter U, is outside the core IUPAC 20 amino acids in Biopython.
However, we sometimes see U both in uniparc transcript sequence and in ENSEMBL transcripts Here is an example:
$ $ transcript_to_AAseq.pl ENST00000611653 | grep U MCASRDDWRCARSMHEFSAKDIDGHMVNLDKYRGFVCIVTNVASQUGKTEVNYTQLVDLHARYAECGLRILAFPCNQFGKQEPGSNEEIKEFAAGYNVKFDMFSKICVNGDDAHPLWKWMKIQPKGKGILGNAIKWNFTKFLIDKNGCVVKRYGPMEEPLVIEKDLPHYF
The fix is to allow U in the transcript sequences, and match to MSE on the structural side.