Segmentation fault after simulation of 1lmb example

yihengwuKP commented 2 years ago

Describe the bug There is always a 'segmentation fault' thrown after the simulation of 1lmb example in my local CPU platform. Although this seems harmless as we already finished the simulation, I found that if I comment out forces.update({force_name:force}) (line 179) in Protein-DNA section, this 'segmentation fault' will be thrown before the simualtion. Weirdly I actually can't reproduce the latter behaviour in my local environment but can in cluster, both platform CPU and CUDA.

To Reproduce Steps to reproduce the behavior:

Go to 'open3spn2/example/Protein_DNA'
run jupyter nbconvert --to python Protein_DNA_example_CPU.ipynb
run python Protein_DNA_example_CPU.py

or further, comment out line 179

    174 #Add DNA-protein interaction forces
    175 for force_name in open3SPN2.protein_dna_forces:
    176     print(force_name)
    177     force = open3SPN2.protein_dna_forces[force_name](dna,protein)
    178     s.addForce(force)
    179     #forces.update({force_name: force})

run python Protein_DNA_example_CPU.py

Expected behavior For the run of original Protein_DNA_example_CPU.py, we expect it to run without any errors; for commenting out the forces dictionary updating line, we expect it has no effect at all on the simulation because the dictionary forces is just used to store the names of the potentials and their force. (The reason why I comment it out is, I try to get rid of the forces entirely, but found that this segfault would be thrown earlier, and I found that the critical component is the Protein-DNA force updates. ) Practically I'll just continue to use this forces and it works, but it's an interesting (and a bit weird) bug so I am reporting it. And like I said, I can't reproduce it in local environment (even if I directly download the buggy script) but it produces bugs in the cluster.

Desktop (please complete the following information):

OS: Manjaro (local); Scientific Linux 7.4 (cluster)

Error message For the original run:

chain A is a DNA chain. it will be removed
chain B is a DNA chain. it will be removed
C 87
D 92
Bond
Angle
Stacking
Dihedral
BasePair
......
pap_1 term ON
No ssweight given, assume all zero
pap2
pap_2 term ON
No ssweight given, assume all zero
-899.1439809511451
TotalEnergy -899.143982 kJ/mol
Bond 327.558667 kJ/mol
Angle 973.859067 kJ/mol
.......
pap1 -0.0 kJ/mol
pap2 -0.0 kJ/mol
#"Step","Time (ps)","Potential Energy (kJ/mole)","Temperature (K)"
100,0.20000000000000015,-4280.276008936478,170.52526668552082
200,0.4000000000000003,-4134.542026299682,203.75283918870957
300,0.6000000000000004,-4036.356982095788,225.23115897642245
400,0.8000000000000006,-3966.457310441064,237.45138976781178
500,1.0000000000000007,-3889.056514398391,247.6502311417848
600,1.2000000000000008,-3818.037579398763,251.60876382973962
700,1.400000000000001,-3796.3866712439412,267.82497316002235
800,1.6000000000000012,-3821.7808458729273,282.49289463831553
900,1.8000000000000014,-3652.7479541963394,284.57682273838384
1000,2.0000000000000013,-3530.1530461379457,282.05862074578164
Bond 63.105515 kJ/mol
Angle 102.496099 kJ/mol
Stacking -446.861355 kJ/mol
Dihedral -481.204538 kJ/mol
BasePair -267.25972 kJ/mol
CrossStacking -54.91899 kJ/mol
Exclusion 1.831451 kJ/mol
Electrostatics 23.869024 kJ/mol
ExclusionProteinDNA -27.788798 kJ/mol
ElectrostaticsProteinDNA -10.197532 kJ/mol
Connectivity 1609.873732 kJ/mol
Chain 1609.873732 kJ/mol
Chi 1609.873732 kJ/mol
Excl 1609.873732 kJ/mol
rama -1587.498034 kJ/mol
rama_pro -1587.498034 kJ/mol
contact -1266.448839 kJ/mol
frag -1004.593585 kJ/mol
beta1 -184.557473 kJ/mol
beta2 -184.557473 kJ/mol
beta3 -184.557473 kJ/mol
pap1 -0.0 kJ/mol
pap2 -0.0 kJ/mol

[1]    171515 segmentation fault (core dumped)  ./Protein_DNA_example_CPU.py

For the commented run:

chain A is a DNA chain. it will be removed
chain B is a DNA chain. it will be removed
C 87
D 92
Bond
Angle
Stacking
Dihedral
BasePair
......
pap_1 term ON
No ssweight given, assume all zero
pap2
pap_2 term ON
No ssweight given, assume all zero
[2]    31762 segmentation fault  ./Protein_DNA_example_CPU.py

For this commented run, the error is thrown in the line 210:

  2 integrator = simtk.openmm.LangevinIntegrator(temperature, 1 / simtk.openmm.unit.picosecond, 2 * simtk.openmm.unit.femtoseconds)
  1 platform = simtk.openmm.Platform.getPlatformByName(platform_name)
210 simulation = simtk.openmm.app.Simulation(top,s, integrator, platform)
  1 simulation.context.setPositions(coord)
  2 energy_unit=simtk.openmm.unit.kilojoule_per_mole

I digged in a bit, and it further calls the initialization of the context, and that's the place the segfault is thrown:

pap_1 term ON
No ssweight given, assume all zero
pap2
pap_2 term ON
No ssweight given, assume all zero
> Protein_DNA_example_CPU.py(149)<module>()
-> print("Setting up the simulation...")
(Pdb) n
Setting up the simulation...
> Protein_DNA_example_CPU.py(150)<module>()
-> integrator = simtk.openmm.LangevinIntegrator(temperature, 1 / simtk.openmm.unit.picosecond, 2 * simtk.openmm.unit.femtoseconds)
(Pdb) n
> Protein_DNA_example_CPU.py(151)<module>()
-> platform = simtk.openmm.Platform.getPlatformByName(platform_name)
(Pdb) n
> Protein_DNA_example_CPU.py(152)<module>()
-> simulation = simtk.openmm.app.Simulation(top,s, integrator, platform)
(Pdb) s
--Call--
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(59)__init__()
-> def __init__(self, topology, system, integrator, platform=None, platformProperties=None, state=None):
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(82)__init__()
-> self.topology = topology
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(84)__init__()
-> if isinstance(system, string_types):
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(88)__init__()
-> self.system = system
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(90)__init__()
-> if isinstance(integrator, string_types):
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(94)__init__()
-> self.integrator = integrator
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(96)__init__()
-> self.currentStep = 0
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(98)__init__()
-> self.reporters = []
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(99)__init__()
-> if platform is None:
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(102)__init__()
-> elif platformProperties is None:
(Pdb) n
> .conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/app/simulation.py(103)__init__()
-> self.context = mm.Context(self.system, self.integrator, platform)
(Pdb) n
[1]    201769 segmentation fault  python -m pdb Protein_DNA_example_CPU.py

It's uncertain to me why or how this bug is created, and I would really appreciate it if you can share some insights on this!

yihengwuKP commented 2 years ago

I also did a gdb run, just in case it's helpful for debugging:

gdb python
run Protein_DNA_example_CPU.py

and the final part of the output is:

......
[Thread 0x7fffd72b2700 (LWP 10666) exited]
[Thread 0x7fffd8ab5700 (LWP 10662) exited]
[Thread 0x7fffe4acd700 (LWP 10633) exited]
[Thread 0x7fffdc2bc700 (LWP 10655) exited]
[Thread 0x7fffdeac1700 (LWP 10648) exited]
[Thread 0x7fffe0ac5700 (LWP 10643) exited]
[Thread 0x7fffe42cc700 (LWP 10634) exited]
[Thread 0x7fffe2ac9700 (LWP 10637) exited]
Bond
Angle
Stacking
Dihedral
BasePair
CrossStacking
Exclusion
Electrostatics
ExclusionProteinDNA
ElectrostaticsProteinDNA
Connectivity
Chain
Chi
Excl
0
639
rama
rama_pro
contact
Number of atom:  1171 Number of residue:  179
Contact cutoff  1.0 nm
NonbondedMethod:  1
670096
670096
frag
Loading Fragment files(Gro files)
Saving fragment table as npy file to speed up future calculation.
.conda/envs/dinucl/lib/python3.6/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)
All gro files information have been stored in the ./single_frags.npy.
You might want to set the 'UseSavedFragTable'=True to speed up the loading next time.
But be sure to remove the .npy file if you modify the .mem file. otherwise it will keep using the old frag memeory.
beta1
beta_1 term ON
beta2
beta_2 term ON
beta3
beta_3 term ON
pap1
pap_1 term ON
No ssweight given, assume all zero
pap2
pap_2 term ON
No ssweight given, assume all zero
Setting up the simulation...

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff63272c1 in OpenMM::ContextImpl::ContextImpl(OpenMM::Context&, OpenMM::System const&, OpenMM::Integrator&, OpenMM::Platform*, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, OpenMM::ContextImpl*) () from /home/yihengwu917/.conda/envs/dinucl/lib/python3.6/site-packages/simtk/openmm/../../../../libOpenMM.so
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64

cabb99 commented 2 years ago

Thank you for writing the steps so clearly, I could easily reproduce the segmentation fault. I still find it strange that the error only happens at the end of the program, so maybe when openmm is trying to clean its variables. The error commenting the line you mentioned seems related, but the forces variable is not used by openmm, so I don't know why it would cause an earlier segmentation fault. Anyway, it seems to me that what it is causing the error are the ExclusionProteinDNA and the ElectrostaticsProteinDNA forces. Can you tell me if you stop getting the segmentation fault when you comment this section:

#Add DNA-protein interaction forces
#for force_name in open3SPN2.protein_dna_forces:
#    print(force_name)
#    force = open3SPN2.protein_dna_forces[force_name](dna,protein)
#    s.addForce(force)
#    forces.update({force_name: force})

yihengwuKP commented 2 years ago

hmmm intertesting, you are right, after commenting this block out, the segmentation fault is gone, not even at the end of the program.

cabb99 / open3spn2

Segmentation fault after simulation of 1lmb example #12