Chibi VM can leave the stack top into an uninitialised and potentially stale
state after SEXP_OP_APPLY1 and SEXP_OP_TAIL_CALL instructions.
Under very particular circumstances this can trigger an out-of-bound error
and can cause a segmentation fault during marking phase in garbage collection.
affected versions: Every Chibi Scheme commit starting at 2922ed591d1c0dc3be7a92e211ac7b18aa12edcc
platforms tested:
macOS 13.6.6, Intel x86-64 (Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Debian Linux Bookworm, Intel x86-64 (gcc 12.2.0)
all platforms are affected
Way to reproduce the issue
prepare a test directory (I will use ${CHIBITEST} for that and Bash as shell.)
mkdir ${CHIBITEST}
cd ${CHIBITEST}
clone the forked Chibi repository and checkout to crash-demo-part1 branch.
Alternatively just create the following crash-test.scm file:
The parent of crash-demo-part1 is the current head of Chibi (7ac3cfeb). Note that the demonstration will not necessarily work, since it is highly dependent on the version of Chibi, on the host system and the host compiler and malloc implementation. Moreover, even a tiny change to the test file (so that is why it is so wierdly written) might make it not to crash, so getting the test program directly from the fork is advised.
This branch only adds the test file on top of the Chibi root directory
or alternatively, if the installed version of Chibi is build from original master.
chibi-scheme -e '(include "crash-test.scm")'
sample input that would be correct
If all does not go well (i.e. you are unable to get the current version to crash) or the proposed fix is used, the output is
crash test
----------
1. running VM for few million cycles for debugging:
- started
- done
2. filling stack to 2000 slots, adding launcher and filling heap:
- started
- done
3. pushing a triggering object to stack:
- started
- done
4. triggering one GC and releasing stack :
- started
- done
5. prefilling heap:
- started
- done
6. triggering the bug:
7. this is seen only with bug fixed
- the sum of the launcher list 2000985
- done
sample output that you think is incorrect
If all goes well (i.e. the test crashes also on your system), you should see
output similar to this. Again, since this is very dependent on the host, the tooling etc, this cannot be guaranteed. If you are not able to produce this, you will need to modify the test (see the analysis part, how to do it). You can build a variant of the progrma that crashes on your system for every commit after introduction of vm.c and gc.c but that is super tedious.
crash test
----------
1. running VM for few million cycles for debugging:
- started
- done
2. filling stack to 2000 slots, adding launcher and filling heap:
- started
- done
3. pushing a triggering object to stack:
- started
- done
4. triggering one GC and releasing stack :
- started
- done
5. prefilling heap:
- started
- done
6. triggering the bug:
Segmentation fault
Explanation of the issue and analysis of the test program
For explanation of the bug (the location of the crash, actual cause of the
crash and implementation details of the test program), you should do the following steps.
git checkout crash-demo-part2
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 34
git checkout crash-demo-part3
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 198 | less
git checkout crash-demo-part4
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 71 | less
In short: the SEXP_OP_APPLY1 leaves the stack top as is. If the code is pure Scheme, it can be
initialised to NULL (that is ok)
a sexp which is not collected (which is also ok)
a pointer sexp which is valid pointer or at least has been a valid pointer at some time (if it is a valid pointer, that is fine, but it can been already become stale and that is problematic)
The stack can thus look like
stack index
stack value
n = top
n - 1
old pointer
n - 2
arg1
...
...
4
arg_N
3
frame info 3
2
frame info 2
1
frame info 1
0 = fp
frame info 0
Since the previous sweeping might have mutated the bytes in the tag field of the pointer in heap, and if we apply an Opcode (or tailcall it with _too few values`) and if the heap is full, the compilation to bytecode can trigger a catastrophic garbage collection. The fix is to write a non-collected value on top of the old pointer.
Brief description of the bug
Chibi VM can leave the stack top into an uninitialised and potentially stale state after
SEXP_OP_APPLY1
andSEXP_OP_TAIL_CALL
instructions. Under very particular circumstances this can trigger an out-of-bound error and can cause a segmentation fault during marking phase in garbage collection.Way to reproduce the issue
${CHIBITEST}
for that and Bash as shell.)crash-demo-part1
branch.Alternatively just create the following
crash-test.scm
file:The parent of
crash-demo-part1
is the current head of Chibi (7ac3cfeb). Note that the demonstration will not necessarily work, since it is highly dependent on the version of Chibi, on the host system and the host compiler and malloc implementation. Moreover, even a tiny change to the test file (so that is why it is so wierdly written) might make it not to crash, so getting the test program directly from the fork is advised.This branch only adds the test file on top of the Chibi root directory
or alternatively, if the installed version of Chibi is build from original master.
sample input that would be correct
If all does not go well (i.e. you are unable to get the current version to crash) or the proposed fix is used, the output is
sample output that you think is incorrect
If all goes well (i.e. the test crashes also on your system), you should see output similar to this. Again, since this is very dependent on the host, the tooling etc, this cannot be guaranteed. If you are not able to produce this, you will need to modify the test (see the analysis part, how to do it). You can build a variant of the progrma that crashes on your system for every commit after introduction of
vm.c
andgc.c
but that is super tedious.Explanation of the issue and analysis of the test program
For explanation of the bug (the location of the crash, actual cause of the crash and implementation details of the test program), you should do the following steps.
In short: the
SEXP_OP_APPLY1
leaves the stack top as is. If the code is pure Scheme, it can besexp
which is not collected (which is also ok)sexp
which is valid pointer or at least has been a valid pointer at some time (if it is a valid pointer, that is fine, but it can been already become stale and that is problematic)The stack can thus look like
n
=top
n - 1
n - 2
arg1
4
arg_N
3
2
1
0
=fp
Since the previous sweeping might have mutated the bytes in the
tag
field of the pointer in heap, and if we apply anOpcode
(or tailcall it with _too few values`) and if the heap is full, the compilation to bytecode can trigger a catastrophic garbage collection. The fix is to write a non-collected value on top of the old pointer.n
=top
n - 1
SEXP_NULL
n - 2
arg1
4
arg_N
3
2
1
0
=fp