Closed kks32 closed 4 years ago
@kks32 That would be nice to see some performance comparison of using the serialize and deserialize vs the normal hdf5, just to make sure there is no performance reduction. Also, can we check it for different numbers of MPI rank?
We won't have a big difference in the amount of information that is being sent/received. Furthermore, it would be hard to measure any significant speed difference in MPI transfer unless we do 100s of nodes with millions of particles, even if we do that I don't think it will be a big difference considering since the data size change is very small. However, as previously mentioned in the RFC, the time to serialize/deserialize particles as PODs or vector of uint_8t
is benchmarked and the results show the serialization with unit_8t
is faster than POD+MPI_Type_Create_Struct
. Serialization / Deserialization of a POD in itself is faster, however, registering and deregistering the MPI Data types more time than serializating/deserializing as vector of unsigned buffer.
SECTION("Performance benchmarks") {
// Number of iterations
unsigned niterations = 1000;
// Serialization benchmarks
auto serialize_start = std::chrono::steady_clock::now();
for (unsigned i = 0; i < niterations; ++i) {
// Serialize particle
auto buffer = particle->serialize();
// Deserialize particle
std::shared_ptr<mpm::ParticleBase<Dim>> rparticle =
std::make_shared<mpm::Particle<Dim>>(id, pcoords);
REQUIRE_NOTHROW(rparticle->deserialize(buffer, materials));
}
auto serialize_end = std::chrono::steady_clock::now();
// HDF5 serialization
auto hdf5_start = std::chrono::steady_clock::now();
for (unsigned i = 0; i < niterations; ++i) {
// Serialize particle as POD
auto hdf5 = particle->hdf5();
// Deserialize particle with POD
std::shared_ptr<mpm::ParticleBase<Dim>> rparticle =
std::make_shared<mpm::Particle<Dim>>(id, pcoords);
// Initialize MPI datatypes
MPI_Datatype particle_type = mpm::register_mpi_particle_type(hdf5);
REQUIRE_NOTHROW(rparticle->initialise_particle(hdf5, material));
mpm::deregister_mpi_particle_type(particle_type);
}
auto hdf5_end = std::chrono::steady_clock::now();
}
Merging #689 into develop will decrease coverage by
0.11%
. The diff coverage is67.70%
.
@@ Coverage Diff @@
## develop #689 +/- ##
===========================================
- Coverage 96.81% 96.69% -0.11%
===========================================
Files 131 130 -1
Lines 25811 25822 +11
===========================================
- Hits 24987 24968 -19
- Misses 824 854 +30
Impacted Files | Coverage Δ | |
---|---|---|
include/mesh.h | 100.00% <ø> (ø) |
|
include/mesh.tcc | 82.65% <0.00%> (-1.48%) |
:arrow_down: |
include/particles/particle_base.h | 100.00% <ø> (ø) |
|
tests/graph_test.cc | 100.00% <ø> (ø) |
|
include/particles/particle.tcc | 91.92% <82.69%> (-2.01%) |
:arrow_down: |
include/particles/particle.h | 100.00% <100.00%> (ø) |
|
include/solvers/mpm_explicit.tcc | 95.16% <100.00%> (+0.08%) |
:arrow_up: |
tests/particle_serialize_deserialize_test.cc | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1eaa7a4...6c1aef8. Read the comment docs.
The pack/unpack serialization in this PR is faster than the POD struct implementation for 2D sliding block with 4 MPI ranks. The results are an average of 5 different runs.
Schemes | Avg Times (ms) | SD (ms) |
---|---|---|
Pack/Unpack | 13201 | 326 |
POD/Struct | 13815 | 540 |
@bodhinandach or @tianchiTJ or @jgiven100 would you be able to test the MPI scheme with a material model that has state variables (NorSand or MC)? Check with load balancing or any problem that involves migration of particles.
@bodhinandach or @tianchiTJ or @jgiven100 would you be able to test the MPI scheme with a material model that has state variables (NorSand or MC)? Check with load balancing or any problem that involves migration of particles.
I test it by MC model, and I think the result is good.
@kks32 NorSand test looks good
Thanks @jgiven100 and @tianchiTJ for testing with state vars materials
@kks32, I would like to understand the data being presented.
The pack/unpack serialization in this PR is faster than the POD struct implementation for 2D sliding block with 4 MPI ranks. The results are an average of 5 different runs.
Schemes Avg Times (ms) SD (ms) Pack/Unpack 13201 326 POD/Struct 13815 540
What is SD
here? Previously you showed POD has 0.4 to 0.7 speedup as compared to Pack/Unpack but why is this result shows that POD takes longer? (I think I am missing something here, sorry).
@kks32, I would like to understand the data being presented.
The pack/unpack serialization in this PR is faster than the POD struct implementation for 2D sliding block with 4 MPI ranks. The results are an average of 5 different runs. Schemes Avg Times (ms) SD (ms) Pack/Unpack 13201 326 POD/Struct 13815 540
What is
SD
here? Previously you showed POD has 0.4 to 0.7 speedup as compared to Pack/Unpack but why is this result shows that POD takes longer? (I think I am missing something here, sorry).
POD alone is insufficient as you need to register data with MPI_Type_Create_Struct
. This adds additional run-time. Compared to our current implementation on develop, the Pack/Unpack is slightly faster. This is the best way to handle different particle types.
MPM Particle serialization
Summary
Motivation
Design Detail
The
Particle
class will have aserialize
and adeserialize
function both using avector<uint8_t>
as the buffer. In addition, we need to compute the pack size to initialize the buffer with the correct size. This is saved as a private variable.The deserialization function will read from the buffer.
Important consideration: We expect all future derivation of particle types to have the first few bytes to be the
Type
of particle followed bymaterial
information to retrieve in the mesh class for initialization of particle and subsequent deserialization.In addition, the particle type is added to the
Particle
class.This is used to identify the type of particle and create them when they are transferred across MPI tasks. Moreover, we have added
ParticleType
andParticleTypeString
as global maps to determine an index value (int) mapped to a string "P2D". The reason for this is that in serialization if we use string, we have no idea of the length of the string, which makes it complicated. Instead, since we are only going to have a few different particle types, it's easier to set-up a map to do a quick lookup.The MPI
transfer_halo_particles
will be altered to send 1 particle at a time rather than a bulk of particles. This is to achieve sending different particle types in a cell all at once (sequentially), instead of iterating through each particle type.These changes would remove the need for registering MPI particle types and also get us one more step closer to removing the limit of 20 on the
state_vars
.Drawbacks
No potential drawback has been identified.
Rationale and Alternatives
Serialization vs MPI_Type_Create_Struct speed is unknown, we may have to do a performance benchmark to see the difference. Using Struct data types means we have to register each particle type and has a fixed number of state variables.
Different serialization libraries were considered: (Boost Serialization)[https://www.boost.org/doc/libs/1_56_0/libs/serialization/doc/tutorial.html], (Cereal)[http://uscilab.github.io/cereal/], and (bitsery)[https://github.com/fraillt/bitsery]. The fastest bitsery, doesn't have serialization support for Eigen, we can implement a custom serializer but it will take some time. The MPI Pack/Unpack seems to be one of the fastest
Size
Time
If not done, will result in clunkier interface for handling different MPI transfer.
Prior Art
Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are:
https://github.com/STEllAR-GROUP/cpp-serializers
https://github.com/fraillt/bitsery#why-use-bitsery
Unresolved questions
MPI transfer halo particles function is yet to be implemented. Don't foresee an issue, but still TBD.
https://github.com/cb-geo/mpm/pull/680 https://github.com/cb-geo/mpm/pull/681
Changelog