ashinn / chibi-scheme

Official chibi-scheme repository
Other
1.2k stars 142 forks source link

Segmentation fault caused by GC with stale pointer on top of the VM stack #953

Closed petteripiiroinen closed 2 months ago

petteripiiroinen commented 2 months ago

Brief description of the bug

Chibi VM can leave the stack top into an uninitialised and potentially stale state after SEXP_OP_APPLY1 and SEXP_OP_TAIL_CALL instructions. Under very particular circumstances this can trigger an out-of-bound error and can cause a segmentation fault during marking phase in garbage collection.

Way to reproduce the issue

  1. prepare a test directory (I will use ${CHIBITEST} for that and Bash as shell.)
mkdir ${CHIBITEST}
cd ${CHIBITEST}
  1. clone the forked Chibi repository and checkout to crash-demo-part1 branch.

Alternatively just create the following crash-test.scm file:

(begin
  (define (spinning-loop n)
    (if (zero? n) #t
        (spinning-loop (- n 1))))
  (define (throwway-fill-heap n)
    (cond
      ((zero? n) #t)
      (else (make-vector 80000 0)
            (throwway-fill-heap (- n 1)))))
  (define (alternate-fill-heap-2 n keep? acc)
    (cond
      ((zero? n) acc)
      (keep? (alternate-fill-heap-2 (- n 1) #f (cons (cons 1 2) acc)))
      (else (alternate-fill-heap-2 (- n 1) #t acc))))
  (define (alternate-fill-heap n keep? acc)
    (cond
      ((zero? n) acc)
      (keep? (alternate-fill-heap (- n 1) #f (cons (cons 1 2) acc)))
      (else (alternate-fill-heap (- n 1) #t acc))))
  (define (stage-1)
    (write-string "crash test\n")
    (write-string "----------\n")
    (write-string "1. running VM for few million cycles for debugging:\n")
    (write-string "    - started \n")
    (spinning-loop 2360009)
    (write-string "    - done    \n"))
  (define (triggering-list n acc)
    (if (zero? n) acc
        (triggering-list (- n 1) (cons n acc))))
  (define (stage-2)
    (write-string "2. filling stack to 2000 slots, adding launcher ")
    (write-string "and filling heap:\n")
    (write-string "    - started \n")
    (let* ((iter1       0)
           (iter1       95456)
           (stored      (alternate-fill-heap iter1 #f '()))
           (launcher    (triggering-list 2000 '())))
      (stage-2b stored launcher)))
  (define (filling-stack n stored launcher)
    (if (zero? n)
        (begin
          (write-string "    - done    \n")
          (let ((val (stage-3)))
            (cons stored launcher)))
        (let ((val (filling-stack (- n 1) stored launcher)))
          val)))
  (define (stage-2b stored launcher)
    (filling-stack 1000 stored launcher))
  (define (fill-stack-2 n)
    (if (zero? n)
        (begin
          (dbg-message 1 #xabba #xdeed #xc0de)
          (stage-3c))
        (let ((val (fill-stack-2 (- n 1))))
          val)))
  (define (dbg-message msg x y z) #f)
  (define (stage-3c)
    (let ((store1 (cons 1 2))
          (object (cons #xabba 
                        (cons #xdabba 
                              (cons #xc0de 
                                    (cons #xdeed 
                                          (cons #x42 '()))))))
          (store2 (cons 3 4))
          (store3 (cons 5 6))
          (store4 (cons 7 8))
          (store5 (cons 9 10)))
      (cons store1 (cons store2 (cons store3 (cons store4 store5))))))
  (define (stage-3b)
    (let ((stored (fill-stack-2 4)))
      (dbg-message 0 #xabba #xdeed #xc0de)
      stored))
  (define (stage-3)
    (write-string "3. pushing a triggering object to stack:\n")
    (write-string "    - started \n")
    (let ((stored (stage-3b)))
      (write-string "    - done    \n")
      (stage-4 stored)))
  (define (stage-4 store)
    (write-string "4. triggering one GC and releasing stack :\n")
    (write-string "    - started \n")
    (dbg-message 10 #xabba #xdeed #xc0de)
    (dbg-message 5 #xabba #xdeed #xc0de)
    (let ((iter 20670))
      (alternate-fill-heap iter #f '())
      (dbg-message 5 #xabba #xdeed #xc0de)
      (alternate-fill-heap 0 #f '()))
    (write-string "    - done    \n"))
  (define (stage-5)
    (write-string "5. prefilling heap: \n")
    (write-string "    - started \n")
      (alternate-fill-heap     1000 #f '())
      (dbg-message 0 #xabba #xdeed #xc0de)
      (throwway-fill-heap    13)
      (make-vector  8500 0)
      (make-vector  6000 0)
      (make-vector  5000 0)
      (make-vector  5000 0)
      (make-vector  3000 0)
      (make-vector  3000 0)
      (make-vector  3000 0)
      (make-vector  3000 0)
      (make-vector  3000 0)
      (make-vector  3000 0)
    (write-string "    - done    \n"))
  (define (stage-7 x)
    (dbg-message 0 #xabba #xdeed #xc0de)
    (write-string "7. this is seen only with bug fixed\n")
    (write-string " - the sum of the launcher list  ")
    (display x)
    (newline)
    (write-string " - done\n"))
  (define (fill-stack-3 n stored+launcher)
    (if (zero? n)
        (begin
        (dbg-message 0 #xabba #xdeed #xc0de)
        (stage-6 stored+launcher))
        (let ((val (fill-stack-3 (- n 1) stored+launcher)))
          val)))
  (define (stage-6 stored+launcher)
    (write-string "6. triggering the bug: \n")
    (let ((launcher (cdr stored+launcher)))
      (dbg-message 4 #xabba #xdeed #xc0de)
      (stage-7 (apply + (cdr (cdr (cdr (cdr (cdr launcher)))))))))
  ;
  (stage-1)
  (let ((stored+launcher (stage-2)))
    (stage-5)
    (fill-stack-3 839 stored+launcher)
    #t))

The parent of crash-demo-part1 is the current head of Chibi (7ac3cfeb). Note that the demonstration will not necessarily work, since it is highly dependent on the version of Chibi, on the host system and the host compiler and malloc implementation. Moreover, even a tiny change to the test file (so that is why it is so wierdly written) might make it not to crash, so getting the test program directly from the fork is advised.

This branch only adds the test file on top of the Chibi root directory

git clone https://github.com/petteripiiroinen/chibi-scheme.git
cd chibi-scheme
git checkout crash-demo-part1
git diff --stat master | cat
 crash-test.scm | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)
  1. build this version or use Chibi from original master
export CHIBIBUILD=${CHIBITEST}/test-build
make PREFIX=${CHIBIBUILD} install
  1. trigger (maybe, not guaranteed) the crash by
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'

or alternatively, if the installed version of Chibi is build from original master.

chibi-scheme -e '(include "crash-test.scm")'

sample input that would be correct

If all does not go well (i.e. you are unable to get the current version to crash) or the proposed fix is used, the output is

crash test
----------
1. running VM for few million cycles for debugging:
    - started 
    - done    
2. filling stack to 2000 slots, adding launcher and filling heap:
    - started 
    - done    
3. pushing a triggering object to stack:
    - started 
    - done    
4. triggering one GC and releasing stack :
    - started 
    - done    
5. prefilling heap: 
    - started 
    - done    
6. triggering the bug: 
7. this is seen only with bug fixed
 - the sum of the launcher list  2000985
 - done

sample output that you think is incorrect

If all goes well (i.e. the test crashes also on your system), you should see output similar to this. Again, since this is very dependent on the host, the tooling etc, this cannot be guaranteed. If you are not able to produce this, you will need to modify the test (see the analysis part, how to do it). You can build a variant of the progrma that crashes on your system for every commit after introduction of vm.c and gc.c but that is super tedious.

crash test
----------
1. running VM for few million cycles for debugging:
    - started 
    - done    
2. filling stack to 2000 slots, adding launcher and filling heap:
    - started 
    - done    
3. pushing a triggering object to stack:
    - started 
    - done    
4. triggering one GC and releasing stack :
    - started 
    - done    
5. prefilling heap: 
    - started 
    - done    
6. triggering the bug: 
Segmentation fault

Explanation of the issue and analysis of the test program

For explanation of the bug (the location of the crash, actual cause of the crash and implementation details of the test program), you should do the following steps.

git checkout crash-demo-part2
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 34 
git checkout crash-demo-part3
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 198 | less
git checkout crash-demo-part4
make PREFIX=${CHIBIBUILD} install
LD_LIBRARY_PATH=${CHIBIBUILD}/lib ${CHIBIBUILD}/bin/chibi-scheme -e '(include "crash-test.scm")'
git log | head -n 71 | less

In short: the SEXP_OP_APPLY1 leaves the stack top as is. If the code is pure Scheme, it can be

  1. initialised to NULL (that is ok)
  2. a sexp which is not collected (which is also ok)
  3. a pointer sexp which is valid pointer or at least has been a valid pointer at some time (if it is a valid pointer, that is fine, but it can been already become stale and that is problematic)

The stack can thus look like

stack index stack value
n = top
n - 1 old pointer
n - 2 arg1
 ... ...
4 arg_N
3 frame info 3
2 frame info 2
1 frame info 1
0 = fp frame info 0

Since the previous sweeping might have mutated the bytes in the tag field of the pointer in heap, and if we apply an Opcode (or tailcall it with _too few values`) and if the heap is full, the compilation to bytecode can trigger a catastrophic garbage collection. The fix is to write a non-collected value on top of the old pointer.

stack index stack value
n = top
n - 1 SEXP_NULL
n - 2 arg1
 ... ...
4 arg_N
3 frame info 3
2 frame info 2
1 frame info 1
0 = fp frame info 0
petteripiiroinen commented 2 months ago

This issue was fixed in commit f3b957c when pull request #954 was merged. Closing the issue.