I have restructured the particles class. Changes include:
AoS is now the canonical way to represent particles.
Particle arrays are always allocated with 64-byte-aligned
memory so that every particle (8 doubles, i.e. 64 bytes)
occupies a 64-byte cache line.
There is no longer any imposed limit on the number of
particles that can be communicated. Particles are
stored in Larray buffers (i.e. std::vector) and
are added with calls like pcl_buffer.push_back(pcl),
which automatically reallocates the buffer as necessary.
I use processor-independent application of boundary conditions
to ensure that particle communication completes
within 2*(XLEN+YLEN+ZLEN) iterations of communication.
I enforce boundary conditions by calling virtual
methods such as apply_Xrght_BC(), which takes a list of
particles that need the boundary conditions for the right
edge of the domain to be applied. It is ultimately intended
that the user will inherit from the particle solver and override
this method when appropriate in order to implement user-defined
boundary conditions.
I implemented BCs via MPI self-communication.
This is a coding shortcut that could be eliminated
if this turns out to be a problem. Note that
for the GEM problem, to avoid lots of communication
in the periodic z direction and accelerate convergence
of the field solver, you should make Lz large (e.g. the same as Lx and Ly).
In the process, I also did the following:
I created pclIDgenerator class for particle IDs.
I used double precision rather than long long to
represent particle IDs.
I implemented support for nxc/XLEN to be noninteger.
(This has not yet been tested.)
I implemented a fast 8x8 transpose for the MIC
and used it to convert between AoS and SoA pcls.
This could be extended to Xeon by implementing
the same method with AVX 256-bit intrinsics instead of MIC
512-bit instrinsics.
Internal changes to the code include:
I eliminated the distinction between processor topologies
of fields and particles. This distinction was never properly made.
If we do this, we should first separate the particle and field solvers.
I consolidated random sampling code so that there is
a single point in the code (ipicmath.h) that samples
from a Maxwellian distribution or unit interval.
Particles3D::particle_repopulator() is now much more
efficient. Instead of traversing the list
of particles six times, deleting and repopulating particles with
each pass, the list is now traversed once to delete particles
and repopulated particles are then created and added at the end
of the list.
In the process, I also did the following:
Internal changes to the code include: