Closed swhite2401 closed 4 months ago
Just for future reference, this bug is a bit strange. While debugging some mpi scripts sent in by some new users, we found a specific configuration of parameters that returned segmentation faults.
For example (I omit many lines of code here), if the following code:
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
Nbunches = 1332
nparts_per_bunch = 666
Npart = int(Nbunches * nparts_per_bunch / size) #Computes the good number for MPI
add_beamloading(fring, qfactor, rshunt, Nslice=50, blmode=BLMode.PHASOR, cavitymode=CavityMode.ACTIVE, VoltGain=0.0001, PhaseGain=0.0001)
is launched with
mpirun -n 64 python file.py
We get a segmentation fault. Increasing the nparts_per_bunch solves the problem. Also changing Nslice from 50 to 51 solves the problem.
I have verified that the bug fixed proposed here solves the problem...but I'm not sure I fully understand the details.
In line 176 of atimplib.c there is the line
weight[i] *= bunch_currents[ib]/np_bunch[ib];
which should be replaced by
if (np_bunch[ib] == 0.0) {
weight[i] = 0.0;
}
else {
weight[i] *= bunch_currents[ib]/np_bunch[ib];
}
Just for future reference, this bug is a bit strange. While debugging some mpi scripts sent in by some new users, we found a specific configuration of parameters that returned segmentation faults.
For example (I omit many lines of code here), if the following code:
comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() Nbunches = 1332 nparts_per_bunch = 666 Npart = int(Nbunches * nparts_per_bunch / size) #Computes the good number for MPI add_beamloading(fring, qfactor, rshunt, Nslice=50, blmode=BLMode.PHASOR, cavitymode=CavityMode.ACTIVE, VoltGain=0.0001, PhaseGain=0.0001)
is launched with
mpirun -n 64 python file.py
We get a segmentation fault. Increasing the nparts_per_bunch solves the problem. Also changing Nslice from 50 to 51 solves the problem.
I have verified that the bug fixed proposed here solves the problem...but I'm not sure I fully understand the details.
You may take a look at this: https://en.wikipedia.org/wiki/Undefined_behavior
I add @simoneliuzzo as reviewer because neither me or @lcarver can approve this PR
This PR fixes a bug in the collective effects implementation. Circular buffers are used to store the slice information and some parameters history. These buffers where shifted even when their size was equal to 1. In the new implementation, when the size of these buffers is equal to 1 they are simply overwritten.