suggested changes to the LJ tutorial

kosovan commented 3 years ago

[ ] Links to References seem to fail (probably because square brackets are special characters that affect the behaviour)
[x] Add a list of expected prior knowledge
[x] Avoid phrases like "today's research" or "Espresso is most flexible in the martket" - they can easily become outdated
[x] Meaning of epsilon as the depth of the minimum should be explained
[x] Add formula for the LJ potential in reduced units
[x] Add another formula that includes shift and cutoff in the LJ potential
[x] Axis labels and notation in the figure are not consistent with the text, units on the axes (U/epsilon, r/sigma) are not given
[x] Use matplotlib to create the figure on the spot instead of a picture downloaded from elsewhere?
[x] The statement "In practice, the L-J potential is cutoff beyond a specified distance r_cut ..." is not completely correct - it is a common practice in simulations but not in theoretical treatments
[x] Explain why r_cut = 1.2 sigma or r_cut = 2.5 sigma are often used
[x] Mention that LJ has been introduced to describe VdW interactions but nowadays it is often used to represent generic particles in CG simulations.
[x] Add a reference to user guide for explanation of reduced units
[x] Define and explain the key thermodynamic parameters (N,rho,T) within one block at the beginning when we define what we want to simulate, explain the expected result: liquid/gas/solid/coexistence, refer to literature for detailed discussion?
[x] Explain what happens in espressomd.assert_features(required_features) and related code
[x] Explain the meaning of parameter SKIN
[ ] "we don't shift the potential to zero at the cutoff " is not good practice - should be changed
[ ] Change shift to some reasonable value!!!
[x] Wrong link: the shift='auto' option of espressomd.interactions.LennardJonesInteraction.
[ ] Instead of using local variables as accumulators for observables, write the time evolution in data file and read it from this file for plotting - as it would be done in a normal simulation
[x] Clarify and explain what gamma does: A damping constant gamma = DAMPING usually is a good choice.
[x] explain f_max=0
[x] explain why total energy should be negative or remove this line of code (it is not needed)
[x] explain Langevin Gamma, refer to UG
[x] explain the seed of thermostat
[x] set the length of the simulation in LJ time units, calculate the time steps and steps per stride
[x] rename warmup_time to equilibration_time, explain that this value has to be found by trial and error (we chose a suitable value for the given set of input parameter)
[x] rename "instantaneous temperature" to "kinetic temperature" in: "temperature is fixed and does not fluctuate in the NVT ensemble! The instantaneous temperature is calculated via"
[x] explain some theoretical background behind calculating the autocorrelation function and autocorrelation time - write down the formulas that we are implementing
[x] plot the correlation function on a semilog scale, explain its features and errors
[x] "we consider samples to be uncorrelated if the time between them is larger than 3 times the correlation time" and "For statistical analysis, we only want uncorrelated samples." is not a correct statement. Instead, explain that we want to correct for correlations in the data.
[x] Set up RDF by specifying the bin width, calculate N_bins and other dependent variables
[ ] choose smaller accumulator step than "delta_N=steps_per_uncorrelated_sample"
[ ] Unclear what de-mixing means in "Set this parameter to 216𝜎 to get de-mixing or to 2.5𝜎 to get mixing between the two components."

RudolfWeeber commented 3 years ago

These are good points.

I’m not in favour of writing stuff to a file during the simulation, and plotting from a file, though.

I also don’t think this is recommended practice for production simulations. Particularly text files.

With the exception of very big data such as trajectories, data should be accumulated in structured data types (lists, numpy arrays, espresso time series) during the simulation and stored in a structured format (yml, pickle, numpy) together with the simulation parameters.

I’d not put this in tutorial one though. There’s a video on managing simulation data

https://www.youtube.com/watch?v=64rNmTpoS1c https://www.youtube.com/watch?v=64rNmTpoS1c&t=233s &t=233s

kosovan commented 3 years ago

@RudolfWeeber I will check your video but I think that it is a common practice to store the simulation data in a file and analyze it offline because you often don't know in advance what you will want to analyze in the end. What (structured) data format should be used is a separate question. The main problem I see is that the runtime variables and results are lost when you exit the simulation script.

jngrad commented 3 years ago

Writing to a file can be an issue in CI, because the file system can lag behind during write operations. This synchronization issue can cause the python test to end up in a state where it is trying to open a file that doesn't exist yet but has already been written to.

kosovan commented 3 years ago

@jngrad OK, then I suggest to at least mention that in the production one would save the data and post-process it later. The LJ tutorial is one of the first tutorials that new users encounter, so I think that such things should be mentioned as good practice.

RudolfWeeber commented 3 years ago

I’m not against storing the time series in a file. It should just be done after the run is complete.

And first saving, then loading back and plotting is just extra confusion compared to plotting the data directly.

kosovan commented 3 years ago

I used to write the data at the end of the simulation but it happened too many times that I lost all data if the simulation ended unexpectedly, e.g. due to exceeding the wall time. Therefore, I recommend to my students to store the data after shorter periods of simulation runtime. Nevertheless, this is a minor issue that should not be a part of the tutorial.

jngrad commented 3 years ago

@kosovan Regarding the last items in your list:

Links to References seem to fail (probably because square brackets are special characters that affect the behaviour)

The links work just fine from my side, both in the Jupyter notebook and in the html version that is published automatically.

"we don't shift the potential to zero at the cutoff " is not good practice - should be changed

Change shift to some reasonable value!!!

The goal of the tutorial is to reproduce the simulation results of paper that used LJ without a shift. This is now clearly explained in the tutorial, with an extra mention that it is nowadays preferable to use a shift.

Instead of using local variables as accumulators for observables, write the time evolution in data file and read it from this file for plotting - as it would be done in a normal simulation

This is risky. Depending on the latency of the file system used to run the tutorials, we might end up in a situation where we read a file that doesn't exist yet, even though the operating system says it already exists. This caused us some trouble in the past in C++ unit tests.

choose smaller accumulator step than "delta_N=steps_per_uncorrelated_sample"

If I understand the tutorial correctly, we use this value for delta_N so we don't have to subsample the time series again.

Unclear what de-mixing means in "Set this parameter to 216𝜎 to get de-mixing or to 2.5𝜎 to get mixing between the two components."

With r_cut=2.5𝜎 there is virtually no difference between the two particle types and the fluid is perfectly mixed. With r_cut=2^(1/6)𝜎 the interaction between the two fluids is purely repulsive, which leads to phase separation. There is still a small peak at r=𝜎, which corresponds to particles found at the interface between two phases.

RDF of the mixed LJ fluid. The three RDFs are identical.

RDF of the demixed LJ fluid. The RDF between particles of different types is skewed towards large distance values because the two fluids do not mix.

kosovan commented 3 years ago

@jngrad I am sorry for not responding to your comments. I have not checked the current version of the tutorial yet because I have been on vacation in the meantime. Nevertheless, your explanations to last items on my list seem plausible.

I think that the term "demixing" should be used rather cautiously because it is terminus technicus that refers to thermodynamic stability of coexisting phases in a macroscopic system. The fact that various RDFs (00, 01, 11) are different is a necessary but not a sufficient condition for demixing. One could observe a similar feature in a system that forms nano-domains but does not demix macroscopically.

espressomd / espresso

suggested changes to the LJ tutorial #4260