JuliaMolSim / Molly.jl

Molecular simulation in Julia
Other
393 stars 53 forks source link

Automatic conversion to terminal N-terminal hydrogens #114

Closed axsk closed 1 year ago

axsk commented 1 year ago

Hey, first and foremost, thank you for the great package! As disclaimer, I am super new to the whole MD thing, so this might be totally on my end :>

I was trying to load a .pdb file of an Alanine-Dipeptide but got the error message that no "H" key was found. Digging deeper I then found out that this was due to the default rename_terminal_res=true in System. This replaced the ALA with NALA, but my .pdb only supplied an H atom, not the H1, H2, H3 required in the force field for NALA.

If I understand correctly this is a feature, not a bug. However with only the error message that no key was found I was pretty lost to start with. I cannot judge how sane the default rename_terminal_res setting is, but maybe there should be some kind of check whether ALA -> NALA is actually possible, or kind of a better error message?

It's fine for me now (just setting rename_terminal_res=false), but I thought I'd leave this here at least for others stepping into the same trap :)

jgreener64 commented 1 year ago

Yes this is a pain point as it stands. Currently we require exact matches of residue names to templates and atom names to atoms in the templates, rather than searching for the best template by topology and matching the atoms as appropriate. This means the terminal residues need to be renamed and require the exact correct atoms (H1, H2 and H3 for the N-terminus). Improving system setup will hopefully get more attention this year.

Preparing PDBs and adding missing hydrogens/terminal atoms is a longer term goal too. As a side note, in your case you should make sure that only having H is actually what you want on the terminus - you may want to add more hydrogens using other software.

axsk commented 1 year ago

You were right, I indeed want 3 hydrogens on the terminus, and these are even given (named HH31, HH32, HH33). Looking now (as I learned a bit more) at it, it seems as if Molly changed ALA to NALA, even though it is not a terminal sequence (which I understand are ACE and NME?). I am using the alanine-dipeptide-nowater.pdb:

CRYST1   27.222   27.222   27.222  90.00  90.00  90.00 P 1           1
ATOM      1 HH31 ACE X   1       3.225  27.427   2.566  1.00  0.00            
ATOM      2  CH3 ACE X   1       3.720  26.570   2.110  1.00  0.00            
ATOM      3 HH32 ACE X   1       4.088  25.905   2.891  1.00  0.00            
ATOM      4 HH33 ACE X   1       4.557  26.914   1.502  1.00  0.00            
ATOM      5  C   ACE X   1       2.770  25.800   1.230  1.00  0.00            
ATOM      6  O   ACE X   1       1.600  26.150   1.090  1.00  0.00            
ATOM      7  N   ALA X   2       3.270  24.640   0.690  1.00  0.00            
ATOM      8  H   ALA X   2       4.259  24.471   0.810  1.00  0.00            
ATOM      9  CA  ALA X   2       2.480  23.690  -0.190  1.00  0.00            
ATOM     10  HA  ALA X   2       1.733  24.315  -0.679  1.00  0.00            
ATOM     11  CB  ALA X   2       3.470  23.160  -1.270  1.00  0.00            
ATOM     12  HB1 ALA X   2       4.219  22.525  -0.797  1.00  0.00            
ATOM     13  HB2 ALA X   2       2.922  22.582  -2.014  1.00  0.00            
ATOM     14  HB3 ALA X   2       3.963  24.002  -1.756  1.00  0.00            
ATOM     15  C   ALA X   2       1.730  22.590   0.490  1.00  0.00            
ATOM     16  O   ALA X   2       2.340  21.880   1.280  1.00  0.00            
ATOM     17  N   NME X   3       0.400  22.430   0.210  1.00  0.00            
ATOM     18  H   NME X   3      -0.008  23.118  -0.407  1.00  0.00            
ATOM     19  CH3 NME X   3      -0.470  21.350   0.730  1.00  0.00            
ATOM     20 HH31 NME X   3       0.112  20.693   1.376  1.00  0.00            
ATOM     21 HH32 NME X   3      -1.290  21.786   1.300  1.00  0.00            
ATOM     22 HH33 NME X   3      -0.873  20.775  -0.103  1.00  0.00            
END

I have to say I am not completely sure though and am lacking the time right now to look into this, sorry :( Feel free to close the issue however, as it is "resolved" for me by using the rename_terminal_res option :)

jgreener64 commented 1 year ago

You have run into another pain point in MD, which is confusion over alanine dipeptide. It took me more than a year to realise that it wasn't the molecule I thought it was, i.e. Ala-Ala :laughing:. If it was, the conversion to NALA would be correct at the N-terminus as for standard proteins.

However it is an alanine with an acetyl group on the N-terminus and a methyl amide group on the C-terminus, as per your PDB file. These don't require renaming so setting rename_terminal_res to false is the way to go.

I'll add mention of rename_terminal_res to the docs and relevant docstring to hopefully make some of this clearer.

jgreener64 commented 1 year ago

rename_terminal_res is now mentioned in the docs and relevant docstring.