CI build failed for merged PR

espresso-ci commented 5 years ago

https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/pipelines/7187

mkuron commented 5 years ago

Looks like an actual problem. Broken bonds while running the cellsystem sample.

jngrad commented 5 years ago

I've seen that one before, so it's a recurring problem. I've run it 2000 times again today on my desktop machine without a single failure, however in both cuda:9.0 and cuda:tutorial I was able to reproduce the failure after ~1000 runs. To reproduce that error on desktop, add system.integrator.run(1000000) at the end of the sample, it should take a few seconds (or more) of runtime before a FENE bond breaks.

time	kinetic energy	FENE energy	WCA energy	min bond length	min dist	max velocity
420.71	161.64	723.37	85.04	0.907	0.340	3.84
420.72	163.95	724.09	80.97	0.924	0.364	3.81
420.73	162.01	726.13	77.20	0.937	0.394	3.51
420.74	2.72e+9	729.28	755327.28	0.945	0.398	36984.66
420.75	crash	?	?	?	?	?

It looks like the WCA potential increases suddenly, even though the minimal distance between all particle pairs does not decrease before the crash.

fweik commented 5 years ago

Did you record the seed of the run that crashed?

fweik commented 5 years ago

Did you record the seed of the run that crashed?

jngrad commented 5 years ago

No, but I observed the same trend in multiple independent runs.

jngrad commented 5 years ago

Actually, the time at which the FENE bond breaks is not reproducible, even when setting the numpy+system+thermostat+polymer seeds. Could it be due to skin tuning or my use of the visualizer?

fweik commented 5 years ago

The visualizer should not influence the system. Could you please try without the tuning, and if it crashes please post the seeds or a deterministic script.

On Thu, May 9, 2019, 18:46 Jean-Noël Grad notifications@github.com wrote:

Actually, the time at which the FENE bond breaks is not reproducible, even when setting the numpy+system+thermostat+polymer seeds. Could it be due to skin tuning or my use of the visualizer?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/espressomd/espresso/issues/2820#issuecomment-490980210, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG2FXZ7PY3KNIGX3XDVLA3PURIMZANCNFSM4HLW75HA .

jngrad commented 5 years ago

Update: skin tuning is not deterministic, and without skin tuning the FENE bonds don't break. MWE:

from __future__ import print_function
import numpy as np
import espressomd
espressomd.assert_features(["LENNARD_JONES"])
from espressomd import polymer
from espressomd import interactions
from scipy.spatial.distance import pdist
box_l = 100
system = espressomd.System(box_l=3 * [box_l])
system.set_random_state_PRNG()
system.seed = system.cell_system.get_state()['n_nodes'] * [1234]
np.random.seed(41)
cs = system.cell_system
cs.skin = .48 * box_l
system.thermostat.set_langevin(kT=1.0, gamma=1.0, seed=42)
system.time_step = 0.01

# WCA and FENE
system.non_bonded_inter[0, 0].lennard_jones.set_params(epsilon=1, sigma=1,
    cutoff=2**(1. / 6), shift="auto")
fene = interactions.FeneBond(k=10, d_r_max=1.5)
system.bonded_inter.add(fene)
# polymer
positions = polymer.positions(n_polymers=1, beads_per_chain=100, seed=1234,
                              bond_length=0.97, min_distance=0.969)
for i, pos in enumerate(positions[0]):
    system.part.add(id=i, pos=pos)
    if i > 0: system.part[i].add_bond((fene, i - 1))

cs.set_n_square(True)
#system.integrator.run(10000000);exit(0) # uncomment this line to skip tuning
skin = cs.tune_skin(min_skin=0.5, max_skin=50., tol=0.5, int_steps=100)
print('skin =', skin)
system.time = 0
history = 20 * [None]
for i in range(10000000):
    min_bond_length = np.min(np.linalg.norm(system.part[1:].pos - system.part[:-1].pos, axis=1))
    min_dist = np.min(pdist(system.part[:].pos))
    max_vel = np.max(np.linalg.norm(system.part[:].v, axis=1))
    del history[0]
    history.append((system.time,
          system.analysis.energy()['kinetic'],
          system.analysis.energy()['bonded'],
          system.analysis.energy()['non_bonded'],
          min_bond_length, min_dist, max_vel))
    if max_vel > 20:
        print(i)
        break
    system.integrator.run(1)

for line in history: print(*line)
print("Bonds are about to break")
system.integrator.run(1)

It'll take a few minutes to crash due to the numpy operations, but you can make it happen much faster by setting the box size to 40 and max_skin to 20.

fweik commented 5 years ago

The skin should not affect the result, but the tuning integrates for an nondeterministic time.

jngrad commented 5 years ago

I ran 10 simulations of 30 min with a random numpy seed and a random skin (no tuning) between 0.25 and 0.48 box_l without any bond breaking. As soon as skin tuning is used the bonds will break, unless tuning is followed by a reset of the particle positions:

cs.set_n_square(True)
skin = cs.tune_skin(min_skin=0.5, max_skin=0.48 * box_l, tol=0.5, int_steps=100)
for i, pos in enumerate(positions[0]):
    system.part[i].pos = pos
system.integrator.run(20000000) # 30 min
exit(0)

fweik commented 5 years ago

I said it should not have an influence :-)

jngrad commented 5 years ago

that's why I tested it :-) I was really surprised that resetting the particle positions would solve it. The tuning.cpp code does not seem to have side effects other than broadcasting FIELD_SKIN, I'll have a look into the domain decomposition code to see where it goes.

fweik commented 5 years ago

The test uses n_square iirc

espressomd / espresso

CI build failed for merged PR #2820