Closed obin1 closed 1 year ago
Hello Obin,
Thanks for finding the reason for the segmentation fault problem!
Static memory allocation has always been a problem in KPP: The code crashes if not enough memory has been allocated, and the code also crashes when trying to allocate more memory than available. A while ago, I opened this github issue: https://github.com/KineticPreProcessor/KPP/issues/6
So far, I haven't been brave enough to implement dynamic memory allocation. My knowledge about C is limited, and I'm afraid of creating unwanted side effects.
I'm completely in favor of introducing dynamic memory allocation. However, I think we will have to carefully check that implementation before adopting it into the official distribution:
Are all results identical?
Check if all array indices are within the allowed range.
Check for memory leaks.
I'm happy to help with these tasks, however, as mentioned above, my knowledge of C is limited :-(
hi Rolf! I'm no C expert either, but as far as memory leaks go, I think these variable length arrays are safe and will be deallocated when leaving the scope of GenerateJacReactantProd(): https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
So far, I have verified that this makes identical f90 code for small_strato on a Linux cluster (I don't have a good idea for how to verify this on my Mac due to the nature of the stack overflow for this setting). I think Bob @yantosca is going to check this potential fix in the KPP C-I tests
@obin1 You are correct that 65520 KB is a hard limit for MacOS. Probably because of some BSD-specific thing.
I ran the C-I tests on my MacBook Air and Linux Laptop. With the fixes in this PR, the mechanism no longer encounters the segfault. But there are differences e.g. in the C_rk test where the Mac test has all NaN values as opposed to the Linux test.
Output of C-I tests on Linux:
Output of C-I tests on MacOS X Ventura
Differences: Linux (<) vs Mac (>)
Let me know what you think.
The error message says that ‘ErrOld’ and ‘Hacc’ are uninitialized. Is it possible that the variables are initialized with zero on linux and with a random number on the mac?
@RolfSander that could be. I'll take a look.
@RolfSander: I tried setting those to ZERO
but we still get the NaN values. The issue is in the runge_kutta.c
integrator. Not sure how much time we want to spend on this, given that not too many people are using the C interface.
I agree. Fixing the C integrator is not top priority. It would be more interesting to check if KPP with the dynamic memory allocation still generates identical Fortran files.
On the Linux environment I'm using, Discover, this makes identical f90 code for small_strato with STOICMAT. The two things I tried that are zero diff are 1) small_strato with limit stacksize 486208 2) small_strato with the changes in this PR
It's great to see that you get identical results for the small_strato mechanism with STOICMAT :-)
I will try it with my own (pretty big) chemical mechanism.
I'd like to come back to the question of illegal array indices. Since we make some arrays smaller, it would be good to check if our array indices are still within the proper bounds. AFAIK, there aren't any gcc built-in array bound checks at runtime. Although "-fbounds-check" sounds promising, my gcc man page says that it is only implemented for Fortran and Java :-(
On github, I found a tool called libcrunch:
https://github.com/stephenrkell/libcrunch/
Would that be suitable for us?
Even better: https://cppcheck.sourceforge.io/. I've just downloaded this from the AUR onto my linux laptop and will try to run it on the code.
So I was able to run cppcheck pretty easily. You run it on each file. I just did a couple of for loops and directed the output to log files for each C file:
for f in $KPP_HOME/src/*.c; do cppcheck $f > ./$(basename $f).log 2>&1; done
for f in $KPP_HOME/int/*.c; do cppcheck $f > ./$(basename $f).log 2>&1; done
for f in $KPP_HOME/util/*.c; do cppcheck $f > ./$(basename $f).log 2>&1; done
The output for runge_kutta.c
was:
Checking /home/bob/work/KPP3/int/runge_kutta.c ...
/home/bob/work/KPP3/int/runge_kutta.c:1362:4: error: Array index out of bounds; 'ISTATUS' buffer size is 0 and it is accessed at offset 24. [ctuArrayIndex]
ISTATUS[Nsol]++;
^
/home/bob/work/KPP3/int/runge_kutta.c:786:12: note: Calling function RK_Solve, 11th argument is uninitialized
RK_Solve(N,H,E1,IP1,E2R,E2I,IP2,DZ1,DZ2,DZ3,ISTATUS);
^
/home/bob/work/KPP3/int/runge_kutta.c:1362:4: note: Using argument ISTATUS
ISTATUS[Nsol]++;
^
but also some of the core files have out-of-bounds reported as well, such as in this block in code.c:
int DefineVariable( char * name, int t, int bt, int maxi, int maxj, char * comment, int attr )
{
int i;
VARIABLE * var;
for( i = 0; i < MAX_VAR; i++ )
if( varTable[ i ] == 0 ) break;
if( varTable[ i ] != 0 ) {
printf("\nVariable Table overflow");
return -1;
}
so because the for statement doesn ot have a bracket, it thinks that only the next statement is in the loop. Then when it exits the loop, i is MAX_VAR+1 so you are out of bounds. I might open a separate PR for this.
Oops in the prior code, I think varTable[i]
is out of bounds on purpose.
Hmm, I'm not sure if I understand that piece of code...
The question here seems to be if the for loop was ended with the break statement, or if it had continued till the end. Right?
Couldn't that be tested with a much simpler:
if( i == MAX_VAR ) {...
???
I think you are right @RolfSander. I can try to make a fix for that.
While you're working on this: Could you mention MAX_VAR
in the error message and also show its value?
I might have a potential solution for this issue: https://github.com/KineticPreProcessor/KPP/issues/64
The segfault doesn't occur in the generation of the sparse stoichiometric matrix, but in the call to GenerateJacReactantProd(). Switching a few static arrays to be smaller, dynamic arrays (that are allocated in a heap) fixed the stack overflow problem on my Mac.
Setting ulimit -s unlimited on my Mac sets the stacksize to 65520 KB, which seems like a hard limit for my system. This isn't enough, on a Linux system I tried, small strato needed a stacksize of 486208 KB to run with #STOICMAT ON. If this is a valid fix, it might be relevant for Linux users as well as Mac users: maybe the STOICMAT option doesn't need a huge stacksize.