Closed RudolfWeeber closed 2 months ago
Further suspect
[ ] 0616a11a335385b8ca38a78ec425b029d6d426af This is a change in the pair force kernel. I have ~0.5ms with and ~0.43ms without this commit for the lj benchmakr on 8 cores on Gerenuk.
Probably the effect is two-fold:
The simulations tune to different significantly larger skins with this commit reverted.
There is probably a 3rd performance regression between 4.2 and the current dev version.
third suspect: [ ] 15337d214a396c34b3da433dfcaa3d3ee65f2cc3 This splits the short rnage force kernels per signature. Probably same mechanism as the 2nd one: slower hotloop -> tuning to shorter skins -> more frequent resorts. It is not totally obvious, why this commit should hurt performnce so much. Maybe a few more ParticleForce objects are initialized and the compiler is not smart enough to understand that they are temporary.
This degrades performance from ~0.5ms to ~0.6ms (lj, 8 cores on Gerenuk)
With this change I got 0.09 ms back from https://github.com/espressomd/espresso/commit/0616a11a335385b8ca38a78ec425b029d6d426af
diff --git a/src/core/Particle.hpp b/src/core/Particle.hpp
index 8bf06fe32..cc336a6f9 100644
--- a/src/core/Particle.hpp
+++ b/src/core/Particle.hpp
@@ -270,17 +270,18 @@ struct ParticleForce {
: f(f), torque(torque) {}
#endif
- friend ParticleForce operator+(ParticleForce const &lhs,
- ParticleForce const &rhs) {
-#ifdef ROTATION
- return {lhs.f + rhs.f, lhs.torque + rhs.torque};
-#else
- return lhs.f + rhs.f;
-#endif
+ friend ParticleForce operator+(ParticleForce const &lhs, ParticleForce const &rhs) {
+ ParticleForce result = lhs;
+ result += rhs;
+ return result;
}
ParticleForce &operator+=(ParticleForce const &rhs) {
- return *this = *this + rhs;
+ f += rhs.f;
+#ifdef ROTATION
+ torque += rhs.torque;
+#endif
+ return *this;
}
As I understand it, the original code should be equally efficient when all optimizations occur. However, this does not seem to be the case, even though I have not verified it directly.
I measured a slightly stronger improvement (0.12 ms) with a very recent commit (fe94b6acc).
For 8 cores it's even a bit more.
Edit: With this change I cannot measure a significant performance difference for the other mentioned commits anymore.
Thank you, Riccarod. Can you please open this as a pr?
I guess, the most important one has been fixed.
h
There seem to be a number of performance regression. Between 4.0 and the current python branch, the lj test deteriorates from 0.41ms to 0.6ms on gerenuk. I tried to keep myconfig etc constant.
The performance degradation occurs in (at least 2 steps). I tried to git-bisect (with mixed results) because the timings are somewhat noisy.
Suspects: