espressomd / espresso

The ESPResSo package
https://espressomd.org
GNU General Public License v3.0
229 stars 185 forks source link

Virtual sites coordinates folding breaks with periodic boundary conditions #4703

Closed christophlohrmann closed 1 year ago

christophlohrmann commented 1 year ago
import espressomd
import espressomd.virtual_sites 
import numpy as np

vs_distance = 2
system = espressomd.System(box_l =3*[100])
system.cell_system.skin = 0.1
system.time_step = 0.01
system.min_global_cut = 1.5 * vs_distance
system.virtual_sites = espressomd.virtual_sites.VirtualSitesRelative()

real_part = system.part.add(pos=[10.,10.,vs_distance/2.],
                            director=[0,0,1])
virt_part = system.part.add(pos=[10.,10.,-vs_distance/2.],
                            virtual=True)
virt_part.vs_auto_relate_to(real_part)
np.testing.assert_allclose(np.linalg.norm(real_part.pos-virt_part.pos), vs_distance)

system.integrator.run(0)
np.testing.assert_allclose(np.linalg.norm(real_part.pos-virt_part.pos), vs_distance)

real_part.director = [0,0,-1]
system.integrator.run(0)
np.testing.assert_allclose(np.linalg.norm(real_part.pos-virt_part.pos), vs_distance)

This has a similar feel to the virtual sites that teleported through the boxes in Simon K's issue #4604 from a while back.

jngrad commented 1 year ago

Not sure I understand the problem statement. With periodic boundary conditions, particle coordinates are folded when negative. The virtual site is initially at position [10, 10, -1], which after one integration with 0 time steps becomes equal to position [10, 10, 99] because box_l=100. Replacing all the np.linalg.norm(real_part.pos-virt_part.pos) by system.distance(real_part, virt_part) fixes the MWE.

What is unclear to me though, is why the image count wildly varies during integration:

import espressomd
import espressomd.virtual_sites 

vs_distance = 2
system = espressomd.System(box_l =3*[100])
system.cell_system.skin = 0.1
system.time_step = 0.01
system.min_global_cut = 1.5 * vs_distance
system.virtual_sites = espressomd.virtual_sites.VirtualSitesRelative()

real_part = system.part.add(pos=[10.,10.,vs_distance/2.], director=[0,0,1])
virt_part = system.part.add(pos=[10.,10.,-vs_distance/2.], virtual=True)
virt_part.vs_auto_relate_to(real_part)

print("pos unfolded:", system.part.all().pos.tolist())
print("pos folded:  ", system.part.all().pos_folded.tolist())
print("image box:   ", system.part.all().image_box.tolist())
print("distance:    ", system.distance(real_part, virt_part))

print("system.integrator.run(0)")
system.integrator.run(0)

print("pos unfolded:", system.part.all().pos.tolist())
print("pos folded:  ", system.part.all().pos_folded.tolist())
print("image box:   ", system.part.all().image_box.tolist())
print("distance:    ", system.distance(real_part, virt_part))

print("real_part.director = [0,0,-1]")
real_part.director = [0,0,-1]
print("system.integrator.run(0)")
system.integrator.run(0)

print("pos unfolded:", system.part.all().pos.tolist())
print("pos folded:  ", system.part.all().pos_folded.tolist())
print("image box:   ", system.part.all().image_box.tolist())
print("distance:    ", system.distance(real_part, virt_part))

Output on the walberla branch and on the 4.2.1 bugfix release candidate:

pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, -1.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 99.0]]
image box:    [[0, 0, 0], [0, 0, -1]]
distance:     2.0
system.integrator.run(0)
pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, -301.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 99.0]]
image box:    [[0, 0, 0], [0, 0, -3]]
distance:     2.0
real_part.director = [0,0,-1]
system.integrator.run(0)
pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, -397.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 3.0]]
image box:    [[0, 0, 0], [0, 0, -4]]
distance:     2.0

This looks suspicious to me.

Output on the python branch:

pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, -1.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 99.0]]
image box:    [[0, 0, 0], [0, 0, -1]]
distance:     2.0
system.integrator.run(0)
pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, 99.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 99.0]]
image box:    [[0, 0, 0], [0, 0, 0]]
distance:     2.0
real_part.director = [0,0,-1]
system.integrator.run(0)
pos unfolded: [[10.0, 10.0, 1.0], [10.0, 10.0, 3.0]]
pos folded:   [[10.0, 10.0, 1.0], [10.0, 10.0, 3.0]]
image box:    [[0, 0, 0], [0, 0, 0]]
distance:     2.0

This looks much better.

@RudolfWeeber @pkreissl do you have an opinion on this?

christophlohrmann commented 1 year ago

regarding the -1 to 99 change: I know that the folded position comes out correctly, but I would like the unfolded position to remain -1. Are positions folded into the original box when I add particles to the system? It seems like the the connection to the real particle was done correctly because the particle moves to the correct (folded) position when flipped. That means that simulations probably produce correct physics as long as one does not need the actual, unfolded trajectory of the virtual site.

jngrad commented 1 year ago

The regression was introduced in 4.2.1, python and walberla by #4564. The VirtualSitesRelative::update() method no longer updates the image_box, which monotonically increases or decreases with the number of integration steps, and the unfolded coordinates grow rapidly. We need to figure out what should happen when Lees-Edwards boundary conditions are used. When updating the image box counter on one direction, the position in another direction may have to change by a time-dependent offset, in which case the image box of that direction must be updated too.

jngrad commented 1 year ago

I wrote a unit test in python to check corner cases in aec1756c5f8599c21232570a52e59d346f58ab67, but I've barely scratched the surface. It doesn't consider e.g. the case where the LE shear offset makes either one of the particles, or both particles, go through an image box in the shear direction after application of the offset (we would need to write 8 cases), or the case where particles are initially at [0, 0, 0] instead of [1e-7, 0, 0], i.e. to cover the case of the LE offset being automatically applied at particle creation (due to inconsistent folding rules with the regular particle folding mechanism). Fully testing all corner cases would require writing at least 32 cases per direction, with 20 lines per case. To check the folding code after the real particle director gets flipped, one would need to multiply that number by 4.

I don't think it is feasible. The current unit test only has 8 cases for LEbc, but it is already quite hard to read. Also, when running the unit test with multiple MPI ranks, the image box values are different, and I'm not sure why.

In addition to that, the behavior requested in the OP would require not storing the minimal distance between the real particle and virtual site, but rather storing the distance in unfolded coordinates. This leads to new issues:

Yet, the current code cannot stay, because the image box counter keeps increasing at every time step.

christophlohrmann commented 1 year ago

the way I read this is that LEbc and virtual sites do not play nice together. Would it be too bad if the two features were mutually exclusive?

jngrad commented 1 year ago

@bindgens1 @RudolfWeeber do you have any plans for virtual sites + LEbc?

jngrad commented 1 year ago

@christophlohrmann resp. @bindgens1 could you please confirm the bugfix in branches jngrad/vs_rel_folding_421 and jngrad/vs_rel_folding_walberla fixed the virtual sites relative folding bug in 4.2.1 and walberla, and doesn't break bacteria resp. Lees-Edwards simulations? We need your feedback quickly, so that we can move forward with the release.

christophlohrmann commented 1 year ago

@jngrad your commit fixes the flip bug and passes all my bacteria-tests EDIT: on one core, with multiple mpi-ranks it still fails

jngrad commented 1 year ago

EDIT: on one core, with multiple mpi-ranks it still fails

The ghost communication code seems to be broken. The image_box is not carried over MPI domains, hence we silently end up with a default-constructed image_box = [0, 0, 0] in a ghost particle when the corresponding real particle goes through a boundary, and then the virtual sites uses the default-constructed value instead of the correct value.

I've updated both branches with a bugfix.