facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

PDB files not readable because chemical element symbol in columns 77-78 is incorrectly left-justified #616

Open elmar-k opened 1 year ago

elmar-k commented 1 year ago

Bug description The PDB format requires that the chemical element symbol is placed in columns 77-78 right-justified. https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM

But at least this file (probably also others) has it left-justified, which prevents correct parsing of the file in many apps: https://esmatlas.com/resources/detail/MGYP001377353409 (Note that the incorrect placement of the atom names in columns 13-16 is already reported here: #431 )

Wrong: ATOM 1 N MET A 1 -26.771 -22.044 27.283 1.00 0.94 N ATOM 2 CA MET A 1 -25.933 -21.694 26.139 1.00 0.94 C
ATOM 3 C MET A 1 -25.200 -20.379 26.383 1.00 0.94 C

Correct: ATOM 1 N MET A 1 -26.771 -22.044 27.283 1.00 0.94 N ATOM 2 CA MET A 1 -25.933 -21.694 26.139 1.00 0.94 C ATOM 3 C MET A 1 -25.200 -20.379 26.383 1.00 0.94 C