MYSTRANsolver / MYSTRAN

MYSTRAN is a general purpose finite element analysis solver
https://www.mystran.com/
MIT License
55 stars 17 forks source link

MYSTRAN 15.0 Update #9

Closed Bruno02468 closed 10 months ago

Bruno02468 commented 10 months ago

This update contains a lot of commits, and I'll do my best to describe what has been done.

Bug fixes

Most of this update is bug fixes, and most of the bug fixes have been memory-related. As in, MYSTRAN was doing illegal memory operations that caused crashes and/or undefined behaviour. Thorough debugging with valgrind was key! Here's every relevant commit and the rationale behind the code changes:

Almost all of these were detected because they manifested as (usually) silent illegal memory operations. Fixing them helps ensure MYSTRAN behavior is deterministic and prevents crashes or garbled output.


New features

This release was focused on fixing bugs. I just added a NOCOUNTS parameter (968909cc5c0a9b35eada5d19a8e65ffdc4fbb646 and 88c294aca1b94ee802429ffb903215d9e5f40a8c) to disable those "counter" progress-indicating writes to standard output. Why? Because not all terminals can handle it (see: VSCode's debugger terminal), and neither can files. It's disabled by default, so counters are still there if you don't set it. Counters also make some runs longer, since every operation is punctuated by a write syscall. Being able to disable them is good if you're debugging, writing standard output to a log file, or running a large model where 10% time savings mean hours.

If you see any counter getting past NOCOUNTS, sorry, they're really hard to find in the code! Tell us via Discord, a GitHub issue, a forum post, whatever you prefer.


Build and documentation changes

I updated the build instructions (20afd36db5ee84d326599fb1d73e4d5f5ccc56ec and c53e5914c115f459cef09163edd4465685e2d478) to make them more consistent with the way our build script handles libraries.

Also, we bumped our SuperLU version to commit xiaoyeli/superlu@76b2c9a6aea2fd7043a4ddbc43748db7d9145035 in order to integrate a fix for an invalid read while trying to factor a singular matrix.

Oh, and I added the --fcheck=all flag to enable runtime checks. This way, memory bugs are less silent -- as opposed to lurking around until someone decides to run on valgrind. There's a small performance hit, comparable to enabling NOCOUNTS, but the benefits far outweigh the cost. Besides, there are more pressing bottlenecks, and one can always disable that flag by editing CMakeLists.txt if they're so inclined. But I do not recommend that. At all.

And the manual needs updating, of course. So do some other documents. That's in the works.


Other changes

BANDIT is now disabled by default (8e1050de6b2c425f2a712ebf966e67c97c4b851a), regardless of what solver you choose to use. Why? Because it's broken. It's written in Fortran 77 with tons of nonstandard stuff, and getting it to work would be moot: if you're running a model large enough for the banded solver to need BANDIT, you shouldn't be using the banded solver. Use SuperLU: PARAM,SOLLIB,SPARSE.

Extricating BANDIT from the code is low-priority due to its very high difficulty-impact ratio. That means you can still enable it, but it won't work. Don't do it.

Finally, this update is a bump to the 15.0 version, so it was also set on the code (d31a1a6cb35aaa46848e6da4a8af623d3638d309).


A quick warning

The bump in the SuperLU version might require you do a clean build. If you get random linker errors, run a make clean, delete superlu/, Binaries/, CMakeCache.txt, and CMakeFiles/. Then, re-run cmake with the appropriate arguments. Sorry about that, it's got to do with how CMake handles Git submodules.


Results and final remarks

Phew, that was a lot of commits. Let me summarize what this update means.

All models in our current benchmark set now run without any illegal memory operations. So do other models that used to cause trouble, like cshear.bdf (part of the build verification suite) and large_shelled_beam.bdf (user-reported, I think).

That doesn't mean results are necessarily correct. Not all bugs are memory bugs! But this update means that many models that used to trigger nondeterministic behaviour and/or crashes now run to completion. This way, we can actually get the results to verify they're correct, and also work on new features unencumbered by crashes. Not bad for a month's work, eh?

Feedback is very much welcome, and there's more on the way!