gmxdump2pdb.pl correctly writes all bonds except the last residue of a given protein chain

jlotthammer commented 3 years ago

I am having a similar problem as described in issue #46 with the gmxdump2pdb.pl script in LOOS. More specifically, it appears my issue is with the parsing of the n-1 residue, where n is the total number of residues in a given chain. E.g., in a system where there's one chain, I have only one missing bond; however, in a larger system, with >500,000 atoms and 6 protein chains I have 6 missing bonds all of which correspond to the bond between residues n-1 and n.

gmx dump -s md.tpr | ./gmxdump2pdb.pl --constraints md.fake

The expected behavior of the script would be to output bond information between the n-1 and nth residues of the respective chain.

I am using the most recent version of the LOOS gmxdump2pdb.pl script on github. Additionally, my issue is with all-atom simulation files generated from gromacs version 2021 on linux mint ulyana (20.04).

The example I attach below is with only 1 protein chain, not the more complex system I describe above with 6 chains. files.zip

agrossfield commented 3 years ago

Could you also attach the relevant top file? I have an idea what might be going on, but looking at the top file would help.

agrossfield commented 3 years ago

I need more clarification about what you're seeing as the problem. The way I read your complaint, the problem was the linkage of the last residue of the protein chain to the residue before it, which in this case would be the bond between the C in residue 775 (atom index 13402) and the N in residue 776 (atom index 13404).

However, that bond isn't missing, at least not in the psf.

Similarly, I'm not seeing a problem when I load the pdb you gave us into the psf. What is the problem you're observing?

One thing the we might be struggling with is some funky atoms you have with names starting with M, eg atoms 1 and 2 are MN1 and MN2. I'm not sure what they are, but they're not bonded to anything that I can see, and they confuse loos' splitByMolecule. However, if I get rid of them, loos correctly thinks there's 1 chain present:


In [2]: a = loos.createSystem("md.fake.psf")

In [3]: achain = loos.selectAtoms(a, 'segid=="PCHA" && !(name =~"^M")')

In [4]: mols = achain.splitByMolecule()

In [5]: len(mols)
Out[5]: 1

In [6]: len(mols[0])
Out[6]: 12425

agrossfield commented 3 years ago

Per our off-line conversation, I'm going to close this issue.

GrossfieldLab / loos

gmxdump2pdb.pl correctly writes all bonds except the last residue of a given protein chain #47