Open JohanMabille opened 4 years ago
Merging #1232 (b04f9c3) into main (7f4f32b) will increase coverage by
5.18%
. The diff coverage isn/a
.:exclamation: Current head b04f9c3 differs from pull request most recent head 6dbd9e2. Consider uploading reports for the commit 6dbd9e2 to get more accurate results
@@ Coverage Diff @@
## main #1232 +/- ##
==========================================
+ Coverage 47.56% 52.74% +5.18%
==========================================
Files 90 531 +441
Lines 71776 109533 +37757
==========================================
+ Hits 34140 57777 +23637
- Misses 37636 51756 +14120
Impacted Files | Coverage Δ | |
---|---|---|
proteus/NumericalSolution.py | 70.73% <0.00%> (-7.41%) |
:arrow_down: |
proteus/mprans/RDLS.py | 66.98% <0.00%> (-7.40%) |
:arrow_down: |
proteus/Archiver.py | 31.64% <0.00%> (-4.55%) |
:arrow_down: |
proteus/TwoPhaseFlow/TwoPhaseFlowProblem.py | 92.96% <0.00%> (-2.84%) |
:arrow_down: |
proteus/Gauges.py | 93.58% <0.00%> (-1.19%) |
:arrow_down: |
proteus/mprans/BodyDynamics.py | 85.73% <0.00%> (-0.74%) |
:arrow_down: |
proteus/iproteus.py | 24.53% <0.00%> (-0.63%) |
:arrow_down: |
proteus/default_so.py | 90.90% <0.00%> (-0.40%) |
:arrow_down: |
proteus/LinearSolvers.py | 57.83% <0.00%> (-0.30%) |
:arrow_down: |
proteus/Profiling.py | 47.15% <0.00%> (-0.28%) |
:arrow_down: |
... and 462 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update bf2bf66...6dbd9e2. Read the comment docs.
@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the data()[...]
in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).
Can you confirm that this does not hurt performance?
I ran a cutfem-based 2D case using this branch and the master branch. There was no major difference in the running times.
@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the
data()[...]
in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).Can you confirm that this does not hurt performance?
Nice work! @jhcollins you might have some parallel jobs set up where you could do a timing comparison as well. My allocations on HPC are not ready yet, but I'll test some compute intensive jobs on mac osx and linux.
@JohanMabille did you make this conversion by hand or did you write a python script? If via script, it would be nice if you could add that to the scripts directory for future use.
I did this one by hand because I wanted to see if I could add other simplifications (like replacing initialization loops). I can work on a Python script for the other files.
@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?
@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?
Sorry, just saw this. I think we need to verify the performance on something we've run a lot and load up the cores with mesh nodes. Maybe a dambreak or one of your wave flume simulations and try it on 2 or 3 core counts so you can get maybe 1000 vertices per core, 2000 vertices per core and 4000 vertices per core. In 2D you can likely get more like 20,000 vertices per core. If you run it with --profiling you should get a list of the top 20 functions. Typically the residual and jacobian for RANS2P will make it onto the list. The PETSc solve and preconditioner setup would be the top costs, in the 80-90% range, then below that we should see the calculateResidual and calculateJacobian functions. If yuou have go-to FSI simulation, like a floating caisson with ALE, that would be handy because it tests more of the functionality.
My timings are looking great @JohanMabille. I'll merge this tomorrow once a few large jobs run on HPC platforms from Cray and SGI, and I confirm the results are identical and timings equivalent. So far I see some cases where the new implementation appears faster, but it may just be some kind of load fluctuations (though these tests are done on dedicated nodes).
@JohanMabille and @jhcollins I verified that the numerical results are essentially identical on a 2D dambreak (two-phase) and 2D FSI (two-phase with mesh deformation/ALE). There are some differences on the order of 1e-22, which I suspect have to do with the compiler taking different paths at the aggressive -O3 optimization level. For both a "standard load" of 2000 vertices per core and a heavier load of about 10,000 vertices per core, the new indexing is actually slightly faster. @jhcollins let me know if you are able to identify the issue where you found non-trivial differences in the computed solutions. I tested on a Cray XC40 with gnu 7.3.0 compilers.
Mandatory Checklist
Please ensure that the following criteria are met:
As a general rule of thumb, try to follow PEP8 guidelines.
Description