Electrodes tutorial is unstable

espressomd / espresso

The ESPResSo package

https://espressomd.org

GNU General Public License v3.0

223 stars 183 forks source link

Electrodes tutorial is unstable #4850

Closed jngrad closed 4 months ago

jngrad commented 6 months ago

The issue reported in #4798 is still present, and causes CI to fail once per month. See for example pipelines 358223 issue (#4844) and 359158 (issue #4849). The error message shows a different particle position, suggesting the tutorial isn't fully deterministic.

jngrad commented 5 months ago

@schlaicha @keerthirk1995 any progress from your side? This is still a problem for our CI: pipeline 359192 (issue #4851).

schlaicha commented 5 months ago

I won't be able look into this in the coming weeks again. @keerthirk1995 could you have a look at what is the non-deterministic part? I assume it is the steepest descent which can behave differently. So here is what I would try:

do the steepest descent as done now
as this might still end up in a very unfavourable configuration add some warm-up steps in a loop with the langevin friction set significantly higher (~1?):
- start with a small timestep, like 0.001
- run ~200-1000 steps
- increse timestep (0.002, 0.005, 0.01, ...) until you reach the simulation timestep
- then, reduce the langevin friction again to enhance diffusion/equilibration
The number of integration steps and timestep increment needs to be adjusted a little... Let me know if you have questions!

jngrad commented 5 months ago

Tutorial part 1 is now failing too: pipeline 359826 (issue #4857).

jngrad commented 5 months ago

pipeline 360162 (issue #4862)

jngrad commented 4 months ago

This issue has paralyzed the ESPResSo project for a month. The issue tracker was flooded with notifications about this tutorial failing CI. We didn't detect in time an issue from an Ubuntu update on the CI runners because its notifications were drowned in the tutorial notifications, and now all CI runners have an Ubuntu version where the ASAN library is broken. The last merge commit on the python branch at the time of writing is 7ad0534debbbe9dac048f4ff322054001de24587, timestamped a month ago, because we cannot merge a PR if CI is failing, and CI was failing daily due to both ASAN and the tutorial being broken.