This update contains a lot of commits, and I'll do my best to describe what has been done.
Bug fixes
Most of this update is bug fixes, and most of the bug fixes have been memory-related. As in, MYSTRAN was doing illegal memory operations that caused crashes and/or undefined behaviour. Thorough debugging with valgrind was key! Here's every relevant commit and the rationale behind the code changes:
ae8404ec1d9fa13eb2cc2f8713185d9ef7ef32ef: ELMOFF was comparing QUAD4 areas to an uninitialized EPS1, causing it to think positive areas were negative. The fix: initialize EPS1 to be EPSIL(1) as everywhere else does.
de46a9dfdfc1ec63c20e5707030db40433df0423: GPWG had some debug prints that tried to write the value of ITABLE before it was set. The fix: move the initialization of ITABLE a couple lines up.
8930eb873795d8880cf73d964f2f41a806ca2880: ANY_U_P_OUTPUT, a variable with all assignments and uses commented out, appeared on an IF statement, thus sometimes being read even though it was never set. The fix: comment it out of the IF statement as well.
854b5a1fc0b78084933ccb1bcc8d9ef27090a9cb: there was a faulty conditional in the SuperLU version we were using, causing crashes when the global stiffness matrix was singular. The fix: so we changed to a more recent state of their repository since the fix was recent.
aa1ef5d6b48345b59582c163d8f6cd447d6f9988: a small oversight in indexes used to print CSHEAR engineering forces in WRITE_ELEM_ENGR_FORCE caused a buffer to be read past the part that actually held values. The fix: make the indexes consistent with the initialized part.
6d7aabae33a676cf87bdfdcc7dd99e735d8147f1: OP2 code in WRITE_ELEM_STRESSES forgot to set a value called STRESS_CODE, used to indicate what's about to be written. The fix: put in a placeholder value for now -- there's other placeholder values in that same subroutine anyway.
5ae05206b6867eb5c65e5aaa7a97a601377a0dbe: a missing ELSE was causing a fatal error to be issued for valid models containing CSHEAR cards with a nonzero OCID. The fix: added the ELSE.
e8f5ca0b67107e5d673903b1d745a9db79a0e011: the subroutine GET_KE_OFFSET was communicating the wrong shape of BUSH element parameters (KEO_BUSH) to a matrix operation (MATPUT). The fix: communicate the correct shape.
8b9b03e385ca1b4eb87ec7ee14e3a370aa1a7a35: a post-deallocation subroutine was being called for the matrix I2_GMN regardless of whether it had been just deallocated or not, triggering a bad read. The fix: move the call into the conditional for whether I2_GMN was just deallocated.
5777eb1e9c35834c36ab87d719c3979213d48e1c: the SCNUM vector, containing subcase numbers, was being allocated with the wrong size, resulting in an overflow when it was written to. In MODES solutions, the "subcases" are the number of requested eigenvectors. The fix: set LSUB, which determines the maximum number of subcases in the solution, as soon was we see the EIGR bulk data card "chosen" by the METHOD case control card during LOADB. This way, all subsequent (i.e. after LINK0) allocations of SCNUM will be the correct size. There are no writes to SCNUM that could overflow before, since we set the size before the limit is even communicated to the rest of the program.
215f75cf2c3afb12afed2703d9b4cb316c333bb5: same as above, but for EIGRL cards and the OGROUT vector. The fix: same as above, but setting LSUB as soon as we see the right EIGRL card during LOADB.
ebf53f11ea47a5d8028ecbf5754ba44fed00a0a5: a debug write was reading from an uninitialized variable depending on the requested data. The fix: initialize it just like in similar requests.
8b874e4864b8aaec665d4c87bedc59282bebd25f: okay, so, when a BUSH element has negative OCID (or the continuation card is absent), that means we use the GA-GB line as the x-axis for computing offsets. However, when GA and GB coincide, that line can't be computed, but an attempt was being made to compute them anyway, resulting in some nasty memory bugs. The fix: detect zero-length BUSH elements with negative OCID and mark them as non-offset, thus avoiding the problematic computations.
Almost all of these were detected because they manifested as (usually) silent illegal memory operations. Fixing them helps ensure MYSTRAN behavior is deterministic and prevents crashes or garbled output.
New features
This release was focused on fixing bugs. I just added a NOCOUNTS parameter (968909cc5c0a9b35eada5d19a8e65ffdc4fbb646 and 88c294aca1b94ee802429ffb903215d9e5f40a8c) to disable those "counter" progress-indicating writes to standard output. Why? Because not all terminals can handle it (see: VSCode's debugger terminal), and neither can files. It's disabled by default, so counters are still there if you don't set it. Counters also make some runs longer, since every operation is punctuated by a write syscall. Being able to disable them is good if you're debugging, writing standard output to a log file, or running a large model where 10% time savings mean hours.
If you see any counter getting past NOCOUNTS, sorry, they're really hard to find in the code! Tell us via Discord, a GitHub issue, a forum post, whatever you prefer.
Build and documentation changes
I updated the build instructions (20afd36db5ee84d326599fb1d73e4d5f5ccc56ec and c53e5914c115f459cef09163edd4465685e2d478) to make them more consistent with the way our build script handles libraries.
Also, we bumped our SuperLU version to commit xiaoyeli/superlu@76b2c9a6aea2fd7043a4ddbc43748db7d9145035 in order to integrate a fix for an invalid read while trying to factor a singular matrix.
Oh, and I added the --fcheck=all flag to enable runtime checks. This way, memory bugs are less silent -- as opposed to lurking around until someone decides to run on valgrind. There's a small performance hit, comparable to enabling NOCOUNTS, but the benefits far outweigh the cost. Besides, there are more pressing bottlenecks, and one can always disable that flag by editing CMakeLists.txt if they're so inclined. But I do not recommend that. At all.
And the manual needs updating, of course. So do some other documents. That's in the works.
Other changes
BANDIT is now disabled by default (8e1050de6b2c425f2a712ebf966e67c97c4b851a), regardless of what solver you choose to use. Why? Because it's broken. It's written in Fortran 77 with tons of nonstandard stuff, and getting it to work would be moot: if you're running a model large enough for the banded solver to need BANDIT, you shouldn't be using the banded solver. Use SuperLU: PARAM,SOLLIB,SPARSE.
Extricating BANDIT from the code is low-priority due to its very high difficulty-impact ratio. That means you can still enable it, but it won't work. Don't do it.
Finally, this update is a bump to the 15.0 version, so it was also set on the code (d31a1a6cb35aaa46848e6da4a8af623d3638d309).
A quick warning
The bump in the SuperLU version might require you do a clean build. If you get random linker errors, run a make clean, delete superlu/, Binaries/, CMakeCache.txt, and CMakeFiles/. Then, re-run cmake with the appropriate arguments. Sorry about that, it's got to do with how CMake handles Git submodules.
Results and final remarks
Phew, that was a lot of commits. Let me summarize what this update means.
All models in our current benchmark set now run without any illegal memory operations. So do other models that used to cause trouble, like cshear.bdf (part of the build verification suite) and large_shelled_beam.bdf (user-reported, I think).
That doesn't mean results are necessarily correct. Not all bugs are memory bugs! But this update means that many models that used to trigger nondeterministic behaviour and/or crashes now run to completion. This way, we can actually get the results to verify they're correct, and also work on new features unencumbered by crashes. Not bad for a month's work, eh?
Feedback is very much welcome, and there's more on the way!
This update contains a lot of commits, and I'll do my best to describe what has been done.
Bug fixes
Most of this update is bug fixes, and most of the bug fixes have been memory-related. As in, MYSTRAN was doing illegal memory operations that caused crashes and/or undefined behaviour. Thorough debugging with
valgrind
was key! Here's every relevant commit and the rationale behind the code changes:ELMOFF
was comparing QUAD4 areas to an uninitializedEPS1
, causing it to think positive areas were negative. The fix: initializeEPS1
to beEPSIL(1)
as everywhere else does.GPWG
had some debug prints that tried to write the value ofITABLE
before it was set. The fix: move the initialization ofITABLE
a couple lines up.ANY_U_P_OUTPUT
, a variable with all assignments and uses commented out, appeared on an IF statement, thus sometimes being read even though it was never set. The fix: comment it out of the IF statement as well.WRITE_ELEM_ENGR_FORCE
caused a buffer to be read past the part that actually held values. The fix: make the indexes consistent with the initialized part.WRITE_ELEM_STRESSES
forgot to set a value calledSTRESS_CODE
, used to indicate what's about to be written. The fix: put in a placeholder value for now -- there's other placeholder values in that same subroutine anyway.GET_KE_OFFSET
was communicating the wrong shape of BUSH element parameters (KEO_BUSH
) to a matrix operation (MATPUT
). The fix: communicate the correct shape.I2_GMN
regardless of whether it had been just deallocated or not, triggering a bad read. The fix: move the call into the conditional for whetherI2_GMN
was just deallocated.SCNUM
vector, containing subcase numbers, was being allocated with the wrong size, resulting in an overflow when it was written to. In MODES solutions, the "subcases" are the number of requested eigenvectors. The fix: setLSUB
, which determines the maximum number of subcases in the solution, as soon was we see the EIGR bulk data card "chosen" by the METHOD case control card duringLOADB
. This way, all subsequent (i.e. afterLINK0
) allocations ofSCNUM
will be the correct size. There are no writes toSCNUM
that could overflow before, since we set the size before the limit is even communicated to the rest of the program.EIGRL
cards and theOGROUT
vector. The fix: same as above, but settingLSUB
as soon as we see the rightEIGRL
card duringLOADB
.OCID
(or the continuation card is absent), that means we use the GA-GB line as the x-axis for computing offsets. However, when GA and GB coincide, that line can't be computed, but an attempt was being made to compute them anyway, resulting in some nasty memory bugs. The fix: detect zero-length BUSH elements with negativeOCID
and mark them as non-offset, thus avoiding the problematic computations.Almost all of these were detected because they manifested as (usually) silent illegal memory operations. Fixing them helps ensure MYSTRAN behavior is deterministic and prevents crashes or garbled output.
New features
This release was focused on fixing bugs. I just added a
NOCOUNTS
parameter (968909cc5c0a9b35eada5d19a8e65ffdc4fbb646 and 88c294aca1b94ee802429ffb903215d9e5f40a8c) to disable those "counter" progress-indicating writes to standard output. Why? Because not all terminals can handle it (see: VSCode's debugger terminal), and neither can files. It's disabled by default, so counters are still there if you don't set it. Counters also make some runs longer, since every operation is punctuated by awrite
syscall. Being able to disable them is good if you're debugging, writing standard output to a log file, or running a large model where 10% time savings mean hours.If you see any counter getting past
NOCOUNTS
, sorry, they're really hard to find in the code! Tell us via Discord, a GitHub issue, a forum post, whatever you prefer.Build and documentation changes
I updated the build instructions (20afd36db5ee84d326599fb1d73e4d5f5ccc56ec and c53e5914c115f459cef09163edd4465685e2d478) to make them more consistent with the way our build script handles libraries.
Also, we bumped our SuperLU version to commit xiaoyeli/superlu@76b2c9a6aea2fd7043a4ddbc43748db7d9145035 in order to integrate a fix for an invalid read while trying to factor a singular matrix.
Oh, and I added the
--fcheck=all
flag to enable runtime checks. This way, memory bugs are less silent -- as opposed to lurking around until someone decides to run onvalgrind
. There's a small performance hit, comparable to enablingNOCOUNTS
, but the benefits far outweigh the cost. Besides, there are more pressing bottlenecks, and one can always disable that flag by editingCMakeLists.txt
if they're so inclined. But I do not recommend that. At all.And the manual needs updating, of course. So do some other documents. That's in the works.
Other changes
BANDIT is now disabled by default (8e1050de6b2c425f2a712ebf966e67c97c4b851a), regardless of what solver you choose to use. Why? Because it's broken. It's written in Fortran 77 with tons of nonstandard stuff, and getting it to work would be moot: if you're running a model large enough for the banded solver to need BANDIT, you shouldn't be using the banded solver. Use SuperLU:
PARAM,SOLLIB,SPARSE
.Extricating BANDIT from the code is low-priority due to its very high difficulty-impact ratio. That means you can still enable it, but it won't work. Don't do it.
Finally, this update is a bump to the
15.0
version, so it was also set on the code (d31a1a6cb35aaa46848e6da4a8af623d3638d309).A quick warning
The bump in the SuperLU version might require you do a clean build. If you get random linker errors, run a
make clean
, deletesuperlu/
,Binaries/
,CMakeCache.txt
, andCMakeFiles/
. Then, re-runcmake
with the appropriate arguments. Sorry about that, it's got to do with how CMake handles Git submodules.Results and final remarks
Phew, that was a lot of commits. Let me summarize what this update means.
All models in our current benchmark set now run without any illegal memory operations. So do other models that used to cause trouble, like
cshear.bdf
(part of the build verification suite) andlarge_shelled_beam.bdf
(user-reported, I think).That doesn't mean results are necessarily correct. Not all bugs are memory bugs! But this update means that many models that used to trigger nondeterministic behaviour and/or crashes now run to completion. This way, we can actually get the results to verify they're correct, and also work on new features unencumbered by crashes. Not bad for a month's work, eh?
Feedback is very much welcome, and there's more on the way!