MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.27k stars 640 forks source link

PDBWriter writes some C as Ca and some N as Na #2732

Closed RMeli closed 3 years ago

RMeli commented 4 years ago

Expected behavior

I expect PDBWriter to correctly write C and N atoms.

Actual behavior

Some C atoms are written as Ca and some N atoms are written as Na. I think the writer is using the old instead of the new elements attribute (see #2647 and #2648).

Code to reproduce the behavior

The following PDB file is the ligand in 3MXF

HETATM 1076  CAA JQ1 A   1      29.899  18.962   1.724  1.00  7.28           C  
HETATM 1077  CAB JQ1 A   1      25.204  15.116   1.393  1.00  9.95           C  
HETATM 1078  CAC JQ1 A   1      25.682  14.930  -1.694  1.00 10.30           C  
HETATM 1079  CAD JQ1 A   1      29.233  13.118  -5.465  1.00 15.92           C  
HETATM 1080  CAE JQ1 A   1      30.649  11.124  -4.687  1.00 18.36           C  
HETATM 1081  CAF JQ1 A   1      28.543  11.742  -3.510  1.00 15.62           C  
HETATM 1082  OAG JQ1 A   1      32.109  13.864  -4.714  1.00 20.67           O  
HETATM 1083 CLAH JQ1 A   1      24.824  18.476  -6.770  1.00  9.02          CL  
HETATM 1084  CAI JQ1 A   1      25.532  17.964  -4.219  1.00  8.34           C  
HETATM 1085  CAJ JQ1 A   1      27.032  17.110  -5.868  1.00  6.71           C  
HETATM 1086  CAK JQ1 A   1      26.351  17.447  -3.210  1.00  6.66           C  
HETATM 1087  CAL JQ1 A   1      27.857  16.609  -4.867  1.00  6.48           C  
HETATM 1088  CAM JQ1 A   1      31.898  14.964  -2.576  1.00  8.11           C  
HETATM 1089  NAN JQ1 A   1      29.662  16.042  -2.828  1.00  7.69           N  
HETATM 1090  NAO JQ1 A   1      31.817  18.128   0.283  1.00  8.44           N  
HETATM 1091  NAP JQ1 A   1      32.044  17.250  -0.625  1.00  8.68           N  
HETATM 1092  OAQ JQ1 A   1      30.400  13.247  -3.384  1.00 10.41           O  
HETATM 1093  SAR JQ1 A   1      27.810  16.331   1.456  1.00  8.19           S  
HETATM 1094  CAS JQ1 A   1      31.475  13.972  -3.665  1.00 13.62           C  
HETATM 1095  CAT JQ1 A   1      28.375  16.220  -2.489  1.00  7.13           C  
HETATM 1096  CAU JQ1 A   1      25.880  17.822  -5.542  1.00  8.01           C  
HETATM 1097  CAV JQ1 A   1      30.534  18.050   0.671  1.00  7.28           C  
HETATM 1098  CAW JQ1 A   1      27.522  16.765  -3.512  1.00  6.19           C  
HETATM 1099  CAX JQ1 A   1      26.460  15.625   0.646  1.00  7.78           C  
HETATM 1100  CAY JQ1 A   1      26.696  15.557  -0.722  1.00  7.26           C  
HETATM 1101  CAZ JQ1 A   1      30.894  16.581  -0.833  1.00  7.50           C  
HETATM 1102  CBA JQ1 A   1      27.931  16.125  -1.137  1.00  8.00           C  
HETATM 1103  CBB JQ1 A   1      28.660  16.602  -0.030  1.00  6.74           C  
HETATM 1104  CBC JQ1 A   1      30.646  15.463  -1.866  1.00  9.05           C  
HETATM 1105  NBD JQ1 A   1      29.932  17.088  -0.061  1.00  7.77           N  
HETATM 1106  CBE JQ1 A   1      29.731  12.302  -4.299  1.00 14.82           C  

Some C atoms have an atom type that contains CA and some N atoms have an atom type that contains NA (see #1808).

The PDB file written out by the following code is wrong:

In [1]: import MDAnalysis as mda

In [2]: mda.__version__
Out[2]: '0.20.2-dev0'

In [3]: u = mda.Universe("3MXF.pdb")

In [4]: sel = u.select_atoms("not protein and not resname HOH")

In [5]: sel
Out[5]: <AtomGroup with 31 atoms>

In [6]: sel.write("3MXF_out.pdb")

Wrong output:

HEADER    
TITLE     MDANALYSIS FRAME 0: Created by PDBWriter
CRYST1    1.000    1.000    1.000  90.00  90.00  90.00 P 1           1
REMARK     285 UNITARY VALUES FOR THE UNIT CELL AUTOMATICALLY SET
REMARK     285 BY MDANALYSIS PDBWRITER BECAUSE UNIT CELL INFORMATION
REMARK     285 WAS MISSING.
REMARK     285 PROTEIN DATA BANK CONVENTIONS REQUIRE THAT
REMARK     285 CRYST1 RECORD IS INCLUDED, BUT THE VALUES ON
REMARK     285 THIS RECORD ARE MEANINGLESS.
ATOM      1  CAA JQ1 A   1      29.899  18.962   1.724  1.00  7.28      A    C
ATOM      2  CAB JQ1 A   1      25.204  15.116   1.393  1.00  9.95      A    C
ATOM      3  CAC JQ1 A   1      25.682  14.930  -1.694  1.00 10.30      A    C
ATOM      4  CAD JQ1 A   1      29.233  13.118  -5.465  1.00 15.92      A    C
ATOM      5  CAE JQ1 A   1      30.649  11.124  -4.687  1.00 18.36      A    C
ATOM      6  CAF JQ1 A   1      28.543  11.742  -3.510  1.00 15.62      A    C
ATOM      7  OAG JQ1 A   1      32.109  13.864  -4.714  1.00 20.67      A    O
ATOM      8 CLAH JQ1 A   1      24.824  18.476  -6.770  1.00  9.02      A   CL
ATOM      9  CAI JQ1 A   1      25.532  17.964  -4.219  1.00  8.34      A    C
ATOM     10  CAJ JQ1 A   1      27.032  17.110  -5.868  1.00  6.71      A    C
ATOM     11  CAK JQ1 A   1      26.351  17.447  -3.210  1.00  6.66      A    C
ATOM     12  CAL JQ1 A   1      27.857  16.609  -4.867  1.00  6.48      A   CA
ATOM     13  CAM JQ1 A   1      31.898  14.964  -2.576  1.00  8.11      A    C
ATOM     14  NAN JQ1 A   1      29.662  16.042  -2.828  1.00  7.69      A   NA
ATOM     15  NAO JQ1 A   1      31.817  18.128   0.283  1.00  8.44      A   NA
ATOM     16  NAP JQ1 A   1      32.044  17.250  -0.625  1.00  8.68      A   NA
ATOM     17  OAQ JQ1 A   1      30.400  13.247  -3.384  1.00 10.41      A    O
ATOM     18  SAR JQ1 A   1      27.810  16.331   1.456  1.00  8.19      A    S
ATOM     19  CAS JQ1 A   1      31.475  13.972  -3.665  1.00 13.62      A    C
ATOM     20  CAT JQ1 A   1      28.375  16.220  -2.489  1.00  7.13      A    C
ATOM     21  CAU JQ1 A   1      25.880  17.822  -5.542  1.00  8.01      A    C
ATOM     22  CAV JQ1 A   1      30.534  18.050   0.671  1.00  7.28      A    C
ATOM     23  CAW JQ1 A   1      27.522  16.765  -3.512  1.00  6.19      A    C
ATOM     24  CAX JQ1 A   1      26.460  15.625   0.646  1.00  7.78      A    C
ATOM     25  CAY JQ1 A   1      26.696  15.557  -0.722  1.00  7.26      A    C
ATOM     26  CAZ JQ1 A   1      30.894  16.581  -0.833  1.00  7.50      A    C
ATOM     27  CBA JQ1 A   1      27.931  16.125  -1.137  1.00  8.00      A    C
ATOM     28  CBB JQ1 A   1      28.660  16.602  -0.030  1.00  6.74      A    C
ATOM     29  CBC JQ1 A   1      30.646  15.463  -1.866  1.00  9.05      A    C
ATOM     30  NBD JQ1 A   1      29.932  17.088  -0.061  1.00  7.77      A    N
ATOM     31  CBE JQ1 A   1      29.731  12.302  -4.299  1.00 14.82      A   BE
END

Current version of MDAnalysis

RMeli commented 4 years ago

Fix hopefully coming soon since I really need this working. =)

RMeli commented 4 years ago

I just realised this is a duplicate of #2423 and that there is already an open PR (#2442), sorry.

orbeckst commented 4 years ago

Then you might have to reopen #2423?

RMeli commented 4 years ago

2423 and #2442 are still open, pending discussion on how to treat empty fields for element symbols.

orbeckst commented 3 years ago

cc @kaceyreidy – you might also be interested in the problem of correct element handling in MDAnalysis (see also https://github.com/MDAnalysis/mdanalysis/projects/9)

IAlibay commented 3 years ago

@RMeli this should be fixed now with #3001 I'm going to close. Please do re-open if needed.