Open AlexWKinley opened 2 weeks ago
Hey!
I am out of office. I can give feedback by the end of the month
On Wed, 6 Nov 2024, 21:18 AlexWKinley, @.***> wrote:
This PR contains changes to a few time integrator and to lines with the goal of not changing the simulation is any meaningful way, while providing performance improvements. Performance Results
Overall, for models of lines, there is roughly a 2x performance improvement. Currently, the integrator performance improvements are mostly specific to RK2 and RK4.
RK2.png (view on web) https://github.com/user-attachments/assets/108a05f6-7953-4a8e-944e-a1224a32f096 RK4.png (view on web) https://github.com/user-attachments/assets/9a9b8380-648b-462b-9fe6-27a738f42479
line_performance_plots.png (view on web) https://github.com/user-attachments/assets/eb3841dd-ce01-42f1-a011-5ce9780941b3 Integrator / State Changes
The primary changes to the integrator and state code are to avoid memory allocations.
The state code that allows writing
r[1] = r[0] + rd[0] (0.5 dt);
is very convenient, but its current implementation means that every operation on a state allocates memory to create an entire copy of that state. That basic $y_{n+1} = y_n + dy \Delta t$ line allocates memory to store the result of $dy \Delta t$, and then the result of the addition, and even an additional allocation for the assignment.
My current solution to this is the new butcher_row function that can do these sorts of computations in place, without allocation any additional memory. Named as such because it does the math for a single row in a Butcher tableau https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Explicit_Runge.E2.80.93Kutta_methods . With this function we get
// r[1] = r[0] + rd[0] (0.5 dt); butcher_row<1>(r[1], r[0], { 0.5 * dt }, { &rd[0] });
Definitely not a clear and concise as the more mathematical notation, but with significant performance improvements. Because of the additional complexity, I've only switch the RK2 and RK4 integrator to use this function. But I can certainly expand its usage if desired.
The other change to states I've made is to remove the empty constructor and destructor from StateVar and StateVarDeriv. Because of the rule of three/five https://en.cppreference.com/w/cpp/language/rule_of_three defining a destructor (even an empty one), causes an expression like r[1] = r[0] + rd[0] to allocate twice, even though the result of r[0] + rd[0] can be directly assigned to r[1]. Removing the empty destructor allows the compiler to avoid that second allocation.
This should help other integrators even if they're not using butcher_row, but not as significantly as using butcher_row. Line Changes
The first type of changes to lines are again to avoid memory allocations.
Changing line->getStateDeriv() to take references to the node velocities and accelerations avoids having to allocate memory for those results every time. Also changing Line::setState(const std::vector
& pos, const std::vector & vel) to take vector references avoids it allocation copies of the position and velocities vectors. The other kind of change to lines are my attempts at simplifying the code while improving performance. The biggest thing is getting rid of many of the
if (i == 0) ....else if (i == N) ....else ....
by calculating the values that depend on whether it's an internal or external node once, allowing for them to be reused, and making the code nicer to read. Misc
To improve consistency of the benchmarks, and avoid filling up the console with simulation times, I added MoorDyn::SetDisableOutput to disable some of the console and file output. Logistical Notes
Feel free to share any thoughts/questions you have. I know that those at NREL are doing some work from their side that could potentially create some conflicts with some of these changes. I'm happy to delay merging this and rebasing on top of whatever those changes may be myself if that would be easiest.
You can view, comment on, or merge this pull request online at:
https://github.com/FloatingArrayDesign/MoorDyn/pull/269 Commit Summary
- 3252969 https://github.com/FloatingArrayDesign/MoorDyn/pull/269/commits/3252969b8b65223c6b3140a10952dc1e7045c0c4 setup benchmark
- 7fdc3c7 https://github.com/FloatingArrayDesign/MoorDyn/pull/269/commits/7fdc3c7da313d7b2988765341ba640ffd855db9e butcher_row performance optimization
- aaadbc8 https://github.com/FloatingArrayDesign/MoorDyn/pull/269/commits/aaadbc8c95d1ae8caa0108757210bfdf8d539958 Additional line performance improvements
File Changes
(12 files https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files)
- M bench/CMakeLists.txt https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-6d2f52fe94e40d1afde76a6c2285b4672c966380ebf25271ea83896eb2048c36 (1)
- A bench/LinesBench.cpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-989eb8ce588a04a4607379cab01396fefa1a6be29fcc1c94a767f23062944eb1 (67)
- A bench/LinesBench.hpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-fd6ef2ed65a4e067b86c422cbcd49db66815b4854224abcaa9509114a65bfb9f (11)
- M bench/MDBench.cpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-566c725804a08449bb7dc481b5f7ba8cf876644129e9d53ddc0ea6eb341bcc38 (4)
- A bench/Mooring/cases/.gitignore https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-a2080354feaa91ddc711826257d21d4b6f03d25794e763da5993fd22d3255933 (1)
- A bench/Mooring/cases/generate_cases.py https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-e68301d9e67caa59968fdb216171252117a3fc35bac705564b3a9a22a672647e (92)
- M source/Line.cpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-29898bc00510b247ddb019e5cb3b3c646b35ae51e549647eebcf010eb1899fa6 (207)
- M source/Line.hpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-c2fdb05a0e46c53fafbecc1f181e856b352e1df28a102ad0918796ff13114056 (8)
- M source/MoorDyn2.cpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-80a1bfa1212d3b21df426b512779592bdec270a00483d31731941ec8eab55ade (14)
- M source/MoorDyn2.hpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-01e39b64058932ab8f2c6cb3e898f1cfb304f5e65cad9195064b76f9df0aa248 (10)
- M source/State.hpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-f19aeeae84b6c8854671709e13deac266863437e5c513bfaf930a699926cd947 (98)
- M source/Time.cpp https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-aef669bd79784653c94dc8de36cbdd5725acb1c583c2133e81ca403cbf7ac511 (31)
Patch Links:
- https://github.com/FloatingArrayDesign/MoorDyn/pull/269.patch
- https://github.com/FloatingArrayDesign/MoorDyn/pull/269.diff
— Reply to this email directly, view it on GitHub https://github.com/FloatingArrayDesign/MoorDyn/pull/269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMXKKBQIDCHWBYFFH3GWN3Z7J2RNAVCNFSM6AAAAABRJUOV6GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTSMJSGU4DKNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This PR contains changes to a few time integrator and to lines with the goal of not changing the simulation is any meaningful way, while providing performance improvements.
Performance Results
Overall, for models of lines, there is roughly a 2x performance improvement. Currently, the integrator performance improvements are mostly specific to RK2 and RK4.
Integrator / State Changes
The primary changes to the integrator and state code are to avoid memory allocations.
The state code that allows writing
is very convenient, but its current implementation means that every operation on a state allocates memory to create an entire copy of that state. That basic $y_{n+1} = y_n + dy \Delta t$ line allocates memory to store the result of $dy \Delta t$, and then the result of the addition, and even an additional allocation for the assignment.
My current solution to this is the new
butcher_row
function that can do these sorts of computations in place, without allocation any additional memory. Named as such because it does the math for a single row in a Butcher tableau. With this function we getDefinitely not a clear and concise as the more mathematical notation, but with significant performance improvements. Because of the additional complexity, I've only switch the RK2 and RK4 integrator to use this function. But I can certainly expand its usage if desired.
The other change to states I've made is to remove the empty constructor and destructor from StateVar and StateVarDeriv. Because of the rule of three/five defining a destructor (even an empty one), causes an expression like
r[1] = r[0] + rd[0]
to allocate twice, even though the result ofr[0] + rd[0]
can be directly assigned tor[1]
. Removing the empty destructor allows the compiler to avoid that second allocation.This should help other integrators even if they're not using butcher_row, but not as significantly as using butcher_row.
Line Changes
The first type of changes to lines are again to avoid memory allocations.
Changing
line->getStateDeriv()
to take references to the node velocities and accelerations avoids having to allocate memory for those results every time. Also changingLine::setState(const std::vector<vec>& pos, const std::vector<vec>& vel)
to take vector references avoids it allocation copies of the position and velocities vectors.The other kind of change to lines are my attempts at simplifying the code while improving performance. The biggest thing is getting rid of many of the
by calculating the values that depend on whether it's an internal or external node once, allowing for them to be reused, and making the code nicer to read.
Misc
To improve consistency of the benchmarks, and avoid filling up the console with simulation times, I added
MoorDyn::SetDisableOutput
to disable some of the console and file output.Logistical Notes
Feel free to share any thoughts/questions you have. I know that those at NREL are doing some work from their side that could potentially create some conflicts with some of these changes. I'm happy to delay merging this and rebasing on top of whatever those changes may be myself if that would be easiest.