LB particle coupling thermostat seed doesn't fully decorrelate independent simulations

TL;DR: the seed argument of system.thermostat.set_lb() doesn't actually set the random number generator (RNG) seed of the LB particle coupling code. Instead, it sets the lag time in the RNG sequence. Thus two independent simulations with a different seed will exhibit correlated dynamics. The LB fluid seed is not affected by this issue.

Background

We use Philox counters to generate reproducible and cross-platform random number sequences. These are used to calculate the friction noise during thermalization. The RNG sequence is fully determined by 4 input arguments:

the RNG salt, which is a unique id for each kind of thermostat (LB particle coupling, Langevin, Brownian, Stokesian, etc.)
the RNG seed, which the user has control over
two keys, which are typically particle ids (for LB particle coupling, the second key is always zero)

Running the same simulation twice with the same RNG seed produces the exact same trajectory, up to machine precision. Running the same simulation twice with a different RNG seed produces a different trajectory. For the latter, the magnitude of the difference between the two seeds is irrelevant; the RNG seed "decorrelates" RNG sequences, just like the RNG salt does. This is achieved by internally concatenating the salt and seed into a single 64bit integer.

The Philox RNG has an internal counter which is incremented at each time step. This counter essentially controls the current position in the RNG sequence. We do not provide a convenient way for the user to initialize this value. This counter value is saved during checkpointing.

Problem statement

For historical reasons, thermostats in ESPResSo were all implemented differently, and didn't share a common base class. Yet they shared a unified Python interface, which could give users the false impression that they were related. This lead to bad user experience in the 4.1 line of ESPResSo where one thermostat was using the Mersenne Twister RNG instead of the Philox RNG.

The LB particle coupling thermostat in particular was implemented with a hardcoded seed=0, regardless of which seed the user gives as argument to system.thermostat.set_lb(). Yet the particle forces are different when giving a different seed, because the value of the seed argument is stored as the internal counter of the Philox RNG. Thus the LB particle coupling RNG sequence is always the same for all simulations, and the seed argument actually controls the initial time lag in that RNG sequence. This issue is specific to the LB particle coupling code, and does not affect the LB fluid code.

As a result, two independent simulations that differ only in the LB particle coupling seed with not truly be independent: the particle forces from simulation A will be correlated to the particle forces from simulation B with a time lag. This correlation is weak, since it's "diluted" by the LB fluid noise and all other interactions in the system.

It would be useful to know how weak this correlation truly is, and whether it can affect some observables more than other, for example Green-Kubo relations. To that end, I adapted the polymers tutorial to investigate how the correlation between independent simulations that differ only by the LB particle coupling seed affects simulation results. I measured the diffusion coefficient of a polymer with 4 beads connected by FENE bonds, repelled by a WCA potential, and coupled to a thermalized fluid in a 8x8x8 grid in units of the WCA sigma. The simulation was repeated 48 times for two scenarios: one where the Philox seed is 0 and the Philox counter initial value is controlled by the user ("current behavior"), and one where the Philox seed is controlled by the user and the Philox counter initial value is always 0 ("proposed behavior").

Figure 1 plots the mean and standard error of the mean of the diffusion coefficient as a function of the simulation time for both scenarios. We can see in the proposed behavior scenario that the error band tapers for longer simulation times, as expected. In the current behavior scenario, the error band is extremely narrow right at the beginning, and tapers more slowly. Figure 2 plots the standard error of the mean as a function of time to help better visualize how the correlation between independent measurements affects the magnitude of the standard error of the mean.

In my opinion, this is a design flaw. Data production is affected in two ways:

error bars in plots are artificially shrunk by a factor 2.5, because the underlying assumption of independent samples in the formula of the standard of the mean is not satisfied
users might decide to simulate for shorter times, because these independent simulations converge to the same value much faster than they should

Point 1 doesn't affect the samples mean, thus the calculated data point (or the shape of the curve for time-dependent observables) converges to the real value. Point 2 can affect the sample mean due to insufficient sampling time.

In the light of the data produced by this experiment, I'm not convinced we can simply simulate for longer to get the LB fluid RNG to introduce enough entropy to fully decorrelate independent simulations. The current behavior scenario was run again with a hardcoded seed of 1 (data not shown) to make sure my observations were not due to a statistical fluke with the seed of 0, and the same trend emerged.

Figure 1: Convergence analysis of the polymer diffusion coefficient.

Plot of the standard error of the mean of the diffusion coefficient as a function of the simulation time. The proposed behavior is initially 3 times larger in magnitude compared to the current behavior, and 2.2 times larger towards large simulation times.

Figure 2: Comparison of the standard error of the mean in both scenarios.

Proposed solution

Make the seed parameter set the Philox seed, instead of the Philox counter. This was implemented by commit 9550aa1f24b215c090717a6b870b293293e9f535 in #4845. This fix could be backported to the 4.2 line of ESPResSo with an extra MPI callback and 3 lines of Cython code.

Supporting information

Simulation script

```python import sys import numpy as np import scipy.optimize import espressomd import espressomd.analyze import espressomd.accumulators import espressomd.observables import espressomd.polymer import espressomd.lb espressomd.assert_features(['WCA', 'WALBERLA']) def build_polymer(system, n_monomers, polymer_params, fene): positions = espressomd.polymer.linear_polymer_positions( beads_per_chain=n_monomers, **polymer_params) p_previous = None for i, pos in enumerate(positions[0]): p = system.part.add(pos=pos) if p_previous is not None: p.add_bond((fene, p_previous)) p_previous = p def correlator_gk(pids_monomers, tau_max): com_vel = espressomd.observables.ComVelocity(ids=pids_monomers) com_vel_cor = espressomd.accumulators.Correlator( obs1=com_vel, tau_lin=16, tau_max=tau_max, delta_N=10, corr_operation="scalar_product", compress1="discard1") return com_vel_cor def gk_integration(N, tau, acf): x = np.arange(2, 28) y = [1 / 3 * np.trapz(acf[:j], tau[:j]) for j in x] diffusion_coef = np.mean(y[10:]) return diffusion_coef, tau[x], y # Setup constants BOX_L = 8.0 TIME_STEP = 0.01 LOOPS = 100 STEPS = 100 KT = 1.0 GAMMA = 5.0 POLYMER_PARAMS = {'n_polymers': 1, 'bond_length': 1, 'seed': 42, 'min_distance': 0.9} # System setup system = espressomd.System(box_l=3 * [BOX_L]) system.cell_system.skin = 0.4 # Lennard-Jones interaction system.non_bonded_inter[0, 0].wca.set_params(epsilon=1.0, sigma=1.0) # Fene interaction fene = espressomd.interactions.FeneBond(k=7, r_0=1, d_r_max=2) system.bonded_inter.add(fene) com_vel_tau_results = [] com_vel_acf_results = [] N = 4 seed = int(sys.argv[-2]) counter = int(sys.argv[-1]) build_polymer(system, N, POLYMER_PARAMS, fene) # energy minimization system.time_step = 0.002 system.integrator.set_steepest_descent( f_max=1.0, gamma=10, max_displacement=0.01) system.integrator.run(2000) # thermalization with Langevin system.integrator.set_vv() system.time_step = TIME_STEP system.thermostat.set_langevin(kT=1.0, gamma=50, seed=42) system.integrator.run(2000) system.thermostat.turn_off() # thermalization with LB lbf = espressomd.lb.LBFluidWalberla(kT=KT, seed=42, agrid=1, density=1, kinematic_viscosity=5, tau=system.time_step, single_precision=True) system.lb = lbf system.thermostat.set_lb(LB_fluid=lbf, gamma=GAMMA, seed=seed, philox_counter=counter) system.integrator.run(1000) # configure Green-Kubo correlator com_vel_cors = [] n_loops = [1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 256, 384, 512] n_steps = [] for i in n_loops: com_vel_cor = correlator_gk(np.arange(N), LOOPS * STEPS) system.auto_update_accumulators.add(com_vel_cor) com_vel_cors.append(com_vel_cor) def save_data(n_steps, com_vel_tau_results, com_vel_acf_results): com_vel_tau_results = np.array(com_vel_tau_results) com_vel_acf_results = np.reshape(com_vel_acf_results, [len(n_steps), -1]) diffusion_gk = [] diffusion_time = [] for length, tau, acf in zip(n_steps, com_vel_tau_results, com_vel_acf_results): diffusion_coef, xdata, ydata = gk_integration(N, tau, acf) diffusion_gk.append(diffusion_coef) diffusion_time.append(length * TIME_STEP) np.save(f"lb_seed_{seed}_counter_{counter}.npy", [diffusion_gk, diffusion_time]) last_sim_steps = 0 for i, n in enumerate(n_loops): com_vel_cor = com_vel_cors[i] sim_total = n * LOOPS * STEPS system.integrator.run(sim_total - last_sim_steps) system.auto_update_accumulators.remove(com_vel_cor) last_sim_steps = sim_total com_vel_cor.finalize() com_vel_tau_results.append(com_vel_cor.lag_times()) com_vel_acf_results.append(com_vel_cor.result()) n_steps.append(sim_total) save_data(n_steps, com_vel_tau_results, com_vel_acf_results) ```

Launch script

```sh for counter in {1..16}; do ./pypresso mwe.py 0 $counter & done for seed in {1..16}; do ./pypresso mwe.py $seed 0 & done ```

Data analysis

```python import numpy as np import matplotlib.pyplot as plt plt.rcParams.update({'font.size': 16}) def parse(n, pat, burn=0): root = "./" matrix = [] for i in range(n): diffusion_gk, diffusion_time = np.load(f"{root}/{pat.format(i+1)}") matrix.append(diffusion_gk) n = min(len(x) for x in matrix) return diffusion_time[burn:n], np.array([x[burn:n] for x in matrix]) df_old_time, df_old_coef = parse(48, "lb_seed_0_counter_{}.npy", 2) df_new_time, df_new_coef = parse(48, "lb_seed_{}_counter_0.npy", 2) df_old_mean = np.mean(df_old_coef, axis=0) df_old_stderr = np.std(df_old_coef, axis=0) / np.sqrt(df_old_coef.shape[1]) df_new_mean = np.mean(df_new_coef, axis=0) df_new_stderr = np.std(df_new_coef, axis=0) / np.sqrt(df_new_coef.shape[1]) fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(11, 4)) ax2.fill_between([], [], []) ax1.fill_between(df_old_time, df_old_mean - df_old_stderr, df_old_mean + df_old_stderr, alpha=0.5) ax2.fill_between(df_new_time, df_new_mean - df_new_stderr, df_new_mean + df_new_stderr, alpha=0.5) ax1.plot(df_old_time, df_old_mean, label="Current behavior") ax2.plot([], []) ax2.plot(df_new_time, df_new_mean, label="Proposed behavior") ax1.set_xlabel("Number of time steps") ax2.set_xlabel("Number of time steps") ax1.set_ylabel(r"Diffusion coefficient [$\sigma^2/t$]") ax1.legend(loc="lower right") ax2.legend(loc="lower right") plt.tight_layout() plt.show() fig = plt.figure(figsize=(8, 4)) plt.plot(df_old_time, df_old_stderr, label="Current behavior") plt.plot(df_new_time, df_new_stderr, label="Proposed behavior") #plt.yscale('log') #plt.xscale('log') plt.legend() plt.xlabel("Number of time steps") plt.ylabel("Standard error of\n" r"the mean [$\sigma^2/t$]") plt.tight_layout() plt.show() ```

espressomd / espresso