LLNL / sundials

Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
https://computing.llnl.gov/projects/sundials
BSD 3-Clause "New" or "Revised" License
526 stars 130 forks source link

ARKODE error messages are being cutoff #461

Closed Steven-Roberts closed 7 months ago

Steven-Roberts commented 7 months ago

I noticed that some ARKODE error messages are getting cut off, .e.g, At t = 0.001, mxstep steps taken before reaching tou. When SUNDIALS is built in double precision, some format specifiers seem to be for long double, which I think is the issue:

https://github.com/LLNL/sundials/blob/2abd63bd6cbc354fb4861bba8e98d0b95d65e24a/src/arkode/arkode_impl.h#L1026-L1032

Looks like the other _impl.h files have the same thing.

balos1 commented 7 months ago

Lower case l is not for long double, that would be upper case L. l actually has no effect in this case.

balos1 commented 7 months ago

I just tried to reproduce the cutoff message and was unable to:

[ERROR][rank 0][/Volumes/Workspace/SUNDIALS/repos/sundials/src/arkode/arkode.c:852][arkEvolve] At t = 0.061625, mxstep steps taken before reaching tout.
balos1 commented 7 months ago

Is it possible that your terminal is cutting off the output? Maybe try setting the environment variable SUNLOGGER_ERROR_FILENAME to a file.

Steven-Roberts commented 7 months ago

Whoops, looks like it isn't the format specifier then. I saw it could cause undefined behavior, but apparently that was only before C99. After a closer look, the message length here is somehow wrong when I step through in the debugger

https://github.com/LLNL/sundials/blob/2abd63bd6cbc354fb4861bba8e98d0b95d65e24a/src/arkode/arkode.c#L3331

msgfmt has a length of 52, but msgfmt gets set to 51.

I'm seeing this with gcc 8.5 and 13.1. I might need to boil things down to a minimal example to make sure it's not a secondary effect.

Steven-Roberts commented 7 months ago

It could be that ap is used twice without va_copy or va_end: https://stackoverflow.com/q/55274350

balos1 commented 7 months ago

Can you try this?

diff --git a/src/arkode/arkode.c b/src/arkode/arkode.c
index eac2af422..5cdff8575 100644
--- a/src/arkode/arkode.c
+++ b/src/arkode/arkode.c
@@ -3328,12 +3328,17 @@ void arkProcessError(ARKodeMem ark_mem, int error_code, int line,
   /* Initialize the argument pointer variable
      (msgfmt is the last required argument to arkProcessError) */
   va_list ap;
-  va_start(ap, msgfmt);

   /* Compose the message */
+  va_start(ap, msgfmt);
   size_t msglen = vsnprintf(NULL, 0, msgfmt, ap) + 1;
+  va_end(ap);
+
   char* msg     = (char*)malloc(msglen);
+
+  va_start(ap, msgfmt);
   vsnprintf(msg, msglen, msgfmt, ap);
+  va_end(ap);

   do {
     if (ark_mem == NULL)
@@ -3362,7 +3367,6 @@ void arkProcessError(ARKodeMem ark_mem, int error_code, int line,
   while (0);

   /* Finalize argument processing */
-  va_end(ap);
   free(msg);

   return;
Steven-Roberts commented 7 months ago

Yep, that fixed it. Turns out msglen is fine now that it's printing out the correct time. I think you can make a PR from that exact diff to close this issue. Thanks!

balos1 commented 7 months ago

Cool. Ill note that this seems to be specific to Linux as it works fine with gcc 13 on my m1 mac.

balos1 commented 7 months ago

This is probably needed in all of the packages.

balos1 commented 7 months ago

Closed by #462