espressomd / espresso

The ESPResSo package
https://espressomd.org
GNU General Public License v3.0
227 stars 183 forks source link

suggested changes to the LJ tutorial #4260

Closed kosovan closed 3 years ago

kosovan commented 3 years ago
RudolfWeeber commented 3 years ago

These are good points.

Iโ€™m not in favour of writing stuff to a file during the simulation, and plotting from a file, though.

I also donโ€™t think this is recommended practice for production simulations. Particularly text files.

With the exception of very big data such as trajectories, data should be accumulated in structured data types (lists, numpy arrays, espresso time series) during the simulation and stored in a structured format (yml, pickle, numpy) together with the simulation parameters.

Iโ€™d not put this in tutorial one though. Thereโ€™s a video on managing simulation data

https://www.youtube.com/watch?v=64rNmTpoS1c https://www.youtube.com/watch?v=64rNmTpoS1c&t=233s &t=233s

kosovan commented 3 years ago

@RudolfWeeber I will check your video but I think that it is a common practice to store the simulation data in a file and analyze it offline because you often don't know in advance what you will want to analyze in the end. What (structured) data format should be used is a separate question. The main problem I see is that the runtime variables and results are lost when you exit the simulation script.

jngrad commented 3 years ago

Writing to a file can be an issue in CI, because the file system can lag behind during write operations. This synchronization issue can cause the python test to end up in a state where it is trying to open a file that doesn't exist yet but has already been written to.

kosovan commented 3 years ago

@jngrad OK, then I suggest to at least mention that in the production one would save the data and post-process it later. The LJ tutorial is one of the first tutorials that new users encounter, so I think that such things should be mentioned as good practice.

RudolfWeeber commented 3 years ago

Iโ€™m not against storing the time series in a file. It should just be done after the run is complete.

And first saving, then loading back and plotting is just extra confusion compared to plotting the data directly.

kosovan commented 3 years ago

I used to write the data at the end of the simulation but it happened too many times that I lost all data if the simulation ended unexpectedly, e.g. due to exceeding the wall time. Therefore, I recommend to my students to store the data after shorter periods of simulation runtime. Nevertheless, this is a minor issue that should not be a part of the tutorial.

jngrad commented 3 years ago

@kosovan Regarding the last items in your list:

  • Links to References seem to fail (probably because square brackets are special characters that affect the behaviour)

The links work just fine from my side, both in the Jupyter notebook and in the html version that is published automatically.

  • "we don't shift the potential to zero at the cutoff " is not good practice - should be changed
  • Change shift to some reasonable value!!!

The goal of the tutorial is to reproduce the simulation results of paper that used LJ without a shift. This is now clearly explained in the tutorial, with an extra mention that it is nowadays preferable to use a shift.

  • Instead of using local variables as accumulators for observables, write the time evolution in data file and read it from this file for plotting - as it would be done in a normal simulation

This is risky. Depending on the latency of the file system used to run the tutorials, we might end up in a situation where we read a file that doesn't exist yet, even though the operating system says it already exists. This caused us some trouble in the past in C++ unit tests.

  • choose smaller accumulator step than "delta_N=steps_per_uncorrelated_sample"

If I understand the tutorial correctly, we use this value for delta_N so we don't have to subsample the time series again.

  • Unclear what de-mixing means in "Set this parameter to 216๐œŽ to get de-mixing or to 2.5๐œŽ to get mixing between the two components."

With r_cut=2.5๐œŽ there is virtually no difference between the two particle types and the fluid is perfectly mixed. With r_cut=2^(1/6)๐œŽ the interaction between the two fluids is purely repulsive, which leads to phase separation. There is still a small peak at r=๐œŽ, which corresponds to particles found at the interface between two phases.

RDF of the mixed LJ fluid. The three RDFs are identical.

RDF of the demixed LJ fluid. The RDF between particles of different types is skewed towards large distance values because the two fluids do not mix.

kosovan commented 3 years ago

@jngrad I am sorry for not responding to your comments. I have not checked the current version of the tutorial yet because I have been on vacation in the meantime. Nevertheless, your explanations to last items on my list seem plausible.

I think that the term "demixing" should be used rather cautiously because it is terminus technicus that refers to thermodynamic stability of coexisting phases in a macroscopic system. The fact that various RDFs (00, 01, 11) are different is a necessary but not a sufficient condition for demixing. One could observe a similar feature in a system that forms nano-domains but does not demix macroscopically.