Closed kks32 closed 4 years ago
Merging #654 into develop will increase coverage by
0.00%
. The diff coverage is97.45%
.
@@ Coverage Diff @@
## develop #654 +/- ##
=========================================
Coverage 96.66% 96.66%
=========================================
Files 123 124 +1
Lines 25375 25641 +266
=========================================
+ Hits 24527 24785 +258
- Misses 848 856 +8
Impacted Files | Coverage Δ | |
---|---|---|
include/node_base.h | 100.00% <ø> (ø) |
|
include/particles/particle_base.h | 100.00% <ø> (ø) |
|
include/solvers/mpm_base.tcc | 73.21% <50.00%> (-0.48%) |
:arrow_down: |
include/particles/particle.tcc | 93.24% <95.56%> (-0.89%) |
:arrow_down: |
include/node.tcc | 95.12% <96.18%> (-0.86%) |
:arrow_down: |
include/mesh.tcc | 83.94% <100.00%> (+0.04%) |
:arrow_up: |
include/node.h | 100.00% <100.00%> (ø) |
|
include/particles/particle.h | 100.00% <100.00%> (ø) |
|
include/particles/particle_base.tcc | 100.00% <100.00%> (ø) |
|
include/particles/particle_functions.tcc | 100.00% <100.00%> (ø) |
|
... and 8 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3b20da5...a0bbd15. Read the comment docs.
Dear all, I think this refactoring branch is ready for review. Could you all please some time to take a look at it? We can discuss it for sure in the following developer meeting. Here are some notes to ease the checking process:
particle
and node
class with the addition of mpm_properties
and particle_functions
. particle
are moved out to particle_functions
.As this change is breaking for all the development, could you please test it to run some of your simulations and let us know if you encounter any issue. Many many thanks in advance for the help. Any comments and questions are appreciated.
Why initialise_particle(const HDF5Particle& particle, const std::shared_ptr<Material<Tdim>>& material)
is defined as virtual function in particle.h
?
Why
initialise_particle(const HDF5Particle& particle, const std::shared_ptr<Material<Tdim>>& material)
is defined as virtual function inparticle.h
?
Good point. I will remove it, we should either use virtual
or override
, but not both in one go.
Why
initialise_particle(const HDF5Particle& particle, const std::shared_ptr<Material<Tdim>>& material)
is defined as virtual function inparticle.h
?Good point. I will remove it, we should either use
virtual
oroverride
, but not both in one go.
This one also exists in the develop
branch
@tianchiTJ the most recent commit should fix, thanks
I'm getting matching outputs running a PEA simulation on develop
and refactor/particles
👍
Anyways, how is the performance of using ordered map on different scalar and vector properties? Any comparison done yet?
The 3D benchmarks of hydrostatic column shows this branch is about 1.5 - 2x slower.
develop: 615931 ms refactor-particles: 980017 ms
TACC develop: 742655 ms TACC refactor-particles: 1280757 ms
Anyways, how is the performance of using ordered map on different scalar and vector properties? Any comparison done yet?
We have to look at that now separately to see if that gives us any benefit.
@kks32 I created a partially revert branch refactor/particles_2
to move back the functions in particle_functions.tcc
back to particles
. Just want to see whether the slowdown is caused by restructuring the properties or moving the function out to a separate file with passing pointer argument. I am re-running the same 3D hydrostatic benchmark locally.
Looks like the map function is the reason
Branch | Local runtime (s) | TACC runtime (s) |
---|---|---|
develop | 615.9 | 742.6 |
particle (prop map + free fn) | 980 | 1280 |
particle2 (only prop map) | 872.9 | 1250 |
Definitely the map lookup is the reason for the slowness
@kks32 Thanks for checking! I was worried about this before. Any thoughts on changing the data structure?
I'm trying out a few different data structures, I'll let you know once I have the results.
I tried a few different structures:
Branch | TACC runtime (s) |
---|---|
develop | 742.6 |
prop map + free fn | 1280 |
only prop map | 1250 |
only robin map | 1012 |
only vector | 850 |
The vector implementation seems close to develop. Let me try vector implementation with free functions
I tried a few different structures: Branch TACC runtime (s) develop 742.6 prop map + free fn 1280 only prop map 1250 only robin map 1012 only vector 850
The vector implementation seems close to develop. Let me try vector implementation with free functions
Vector with free functions is 865s.
Thanks @kks32, that makes it 1.16 times slower than the original. Are you checking the OpenMP/MPI performance as well? As the current refactor is created purposely for fluid and two-phase implementation, do you want me to check the vector one with the navier-stokes branch too?
Vector implementation profiling
The problem with vector is like you said, we may have to fill everything otherwise we have an access issue.
Branch | TACC runtime (s) |
---|---|
develop | 742.6 |
prop map + free fn | 1280 |
vector + free fn map | 865 |
AssocVec + free fn | 972 |
Profiling of develop
Profiling of robinhashmap
Profiling of vector
Profiling of AssocVector
After considering the speed-ups and different approaches, the use of scalar and vector properties still is 30% slower than using private variables. The advantage of scalar/vector properties is we'll have fewer functions and less repetition of code. On the other hand, defining private variables and deriving particle classes has more function definitions and repeating code, but is faster. I'd like to propose a vote, use emojis to vote between different options, please use comments below if you like to elaborate on your thoughts. @bodhinandach @tianchiTJ @thiagordonho @cw646 @ezrayst @jgiven100 @WeijianLIANG
Thanks Krishna, I would prefer the use of scalar and vector properties which has a more clear structure.
@WeijianLIANG is that option 2 or 3? Could you please vote with a reaction emoji :smile: for option 2 and :heart: for option 3. Thanks!
After considering the speed-ups and different approaches, the use of scalar and vector properties still is 30% slower than using private variables. The advantage of scalar/vector properties is we'll have fewer functions and less repetition of code. On the other hand, defining private variables and deriving particle classes has more function definitions and repeating code, but is faster. I'd like to propose a vote, use emojis to vote between different options, please use comments below if you like to elaborate on your thoughts. @bodhinandach @tianchiTJ @thiagordonho @cw646 @ezrayst @jgiven100 @WeijianLIANG
- Use derived particle types with private variables (🚀)
- Solid particles will remain as is, while we add scalar and vector properties to derived particle types (😄)
- Cleaner but slower design with free functions and scalar/vector properties for all particle types (❤️)
- Go back to the drawing board (👀)
For 2, are we just dealing with the scalar and vector properties to the derived particle types? If we have other types of quantities, do we still use the private variables?
@yliang-sn everything will be held within scalar/vector/tensor/boolean
@yliang-sn everything will be held within scalar/vector/tensor/boolean
Can I vote, too? If the private variables have the best efficiency, I prefer to 1. The structure is also clear enough to be read and extended. However, 2 is also fine for me.
@yliang-sn yes, I couldn't find your handle, please vote.
@thiagordonho @tianchiTJ @cw646 @WeijianLIANG could you please use emoji reactions to vote on this comment please: https://github.com/cb-geo/mpm/pull/654#issuecomment-662429917
Both @kks32 and @bodhinandach have been working hard on this issue so appreciate your work. I think it has been about 2 months now.
However, I feel that we need a unanimous decision on this refactor, or at least a 2/3 decision. The reason being is I don't want us to be in this position within 6 months where we are thinking to refactor again. I think the timeline on this issue has shown that this is a lot of work and everyone's progress is halted. Personally, if we are not even sure on (2) or (3), or anyone still has reservation on them, wouldn't it better to keep it as is (1)? Yes, particle_base
will be messy because of the many functions from two_phase
particles, but at least we get the speedup. Yes we might need to refactor in the future, but isn't that the assumption that we will refactor again? Lastly, if you are sure for (2), might as well go for (3) so we don't come back to it again later. So my vote will be (3) now!
@kks32 Thanks, I also vote for (2)
After considering @ezrayst comment, I changed my vote to option (3). I know that the code will be slower, but the code is structurally way better and tidier. Also, if currently by moving to (3) causing us to be 30% slower than develop, I think we should rethink any other way to improve the speed. For instance, PR #677 has already improved the speed by 4% (thanks to @kks32). I think we can regain the current performance in short time by improving efficiency in other parts.
After talking with Ezra and Nanda, I think i will change to vote for 3 if we can improve the efficiency in other aspects.
Other containers like Abseil, folly, vector of vectors, static_vector, and small_vector implementations all gave slower or almost identical performance as the flat_map. Due to the 30% reduction in performance we are closing this issue. https://trello.com/c/fGF3MrKg/110-plan-for-next-week-notes
Describe the PR This is an alternative implementation of particle refactor.
After some more consideration, I am wondering about the possibility to derive particle types, but only keep the common particle functions within the class, and merge all independent/specialized functions to be outside the class. This will help to initialize the classes with the right types, and also in MPI transfer, as the type is known. We derived data of individual Particle types but keep specialized particle functions separately. Maybe the particle functions may have to stay in the same namespace.
Related Issues/PRs
637 and #650
Additional context Specialized functions will be still separate.
Following are the run-times for 3D hydrostatic column 10k steps on Stampede2