Closed jeffhammond closed 4 years ago
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic 10 1000 1000000 1 2 GEOMETRIC 0.99
Parallel Research Kernels version 2020
C++11 Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = GEOMETRIC
Attenuation factor = 0.99
Particle charge semi-increment = 1
Vertical velocity = 2
Number of particles placed = 997651
Solution validates
Rate (Mparticles_moved/s): 21.8504
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic 10 1000 1000000 0 1 SINUSOIDAL
Parallel Research Kernels version 2020
C++11 Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = SINUSOIDAL
Particle charge semi-increment = 0
Vertical velocity = 1
Number of particles placed = 997481
Solution validates
Rate (Mparticles_moved/s): 21.9011
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic 10 1000 1000000 1 0 LINEAR 1.0 3.0
Parallel Research Kernels version 2020
C++11 Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = LINEAR
Negative slope = 1
Offset = 3
Particle charge semi-increment = 1
Vertical velocity = 0
Number of particles placed = 956333
Solution validates
Rate (Mparticles_moved/s): 21.3273
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic 10 1000 1000000 1 0 PATCH 0 200 100 200
Parallel Research Kernels version 2020
C++11 Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = PATCH
Bounding box = 0, 200, 100, 200
Particle charge semi-increment = 1
Vertical velocity = 0
Number of particles placed = 998675
Solution validates
Rate (Mparticles_moved/s): 21.5123
PRK_DEVICE=GPU
)jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic-sycl 10 1000 1000000 1 2 GEOMETRIC 0.99
Parallel Research Kernels version 2020
C++11/DPC++ Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = GEOMETRIC
Attenuation factor = 0.99
Particle charge semi-increment = 1
Vertical velocity = 2
Number of particles placed = 997651
Solution validates
Rate (Mparticles_moved/s): 86.0933
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic-sycl 10 1000 1000000 0 1 SINUSOIDAL
Parallel Research Kernels version 2020
C++11/DPC++ Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = SINUSOIDAL
Particle charge semi-increment = 0
Vertical velocity = 1
Number of particles placed = 997481
Solution validates
Rate (Mparticles_moved/s): 84.5943
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic-sycl 10 1000 1000000 1 0 LINEAR 1.0 3.0
Parallel Research Kernels version 2020
C++11/DPC++ Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = LINEAR
Negative slope = 1
Offset = 3
Particle charge semi-increment = 1
Vertical velocity = 0
Number of particles placed = 956333
Solution validates
Rate (Mparticles_moved/s): 83.6075
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ ./pic-sycl 10 1000 1000000 1 0 PATCH 0 200 100 200
Parallel Research Kernels version 2020
C++11/DPC++ Particle-in-Cell execution on 2D grid
Grid size = 1000
Number of particles requested = 1000000
Number of time steps = 10
Initialization mode = PATCH
Bounding box = 0, 200, 100, 200
Particle charge semi-increment = 1
Vertical velocity = 0
Number of particles placed = 998675
Solution validates
Rate (Mparticles_moved/s): 84.4211
This is not perfect since I did not remove all of the C-isms, but at least the main function is mostly idiomatic C++.
The initial implementation was created by @zjin-lcf. I applied a large amount of cosmetic changes and fixed a bug that broke the GPU execution.
Note that while this is called pic-dpcpp.cc, it should be standard SYCL. I will rename it later when I make the source more consistent with the other SYCL implementations.
The performance of this code on my NUC is approximately 112 Mp/s, which is ~5x faster than the 22 Mp/s I see on the same 4-core processor with the SERIAL implementation.
New PRK implementation checklist
Which kernels are implemented?
Is Travis CI supported?
If no, why not?
Testing SYCL in Travis CI isn't very useful.
Documentation and build examples
See PIC.md.