Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
857 stars 103 forks source link

Error: current process does not own *kernel-exception-lock* #10

Open kini opened 7 years ago

kini commented 7 years ago

Under some rare and unknown circumstances, calling (static-cons ...) from ACL2 running on top of CCL resulted in the following error:

Error: Current process #<TTY-LISTENER listener(1) [Active] #x302000452AED> does not own lock #<RECURSIVE-LOCK "Kernel exception-lock" [ptr @ #x16DFC40] #x30200001146D>

Here is the value of (lisp-implementation-version):

? (lisp-implementation-version)
"Version 1.12-dev-r16752  (LinuxX8664)"

Here is an abbreviated backtrace, which I believe contains all the frames that are running in CCL itself, rather than user (i.e. ACL2) code. The full backtrace (with a few frames here and there removed because of proprietary information in function arguments, sorry) can be seen here (click "raw" on that page to get the raw text file). This backtrace is produced via ACL2 so I think the formatting is a bit different from what you'd usually get from CCL.

 (7FCED7FF09A0) : 0 (PRINT-CALL-HISTORY :CONTEXT NIL :PROCESS NIL :ORIGIN NIL :DETAILED-P NIL :COUNT 10000 :START-FRAME-NUMBER 0 :STREAM #<SYNONYM-STREAM to *TERMINAL-IO* #x3020033EC66D> :PRINT-LEVEL 2 :PRINT-LENGTH 5 :SHOW-INTERNAL-FRAMES NIL :FORMAT :TRADITIONAL) 869
 (7FCED7FF0B08) : 1 (CALL-CHECK-REGS CCL:PRINT-CALL-HISTORY :DETAILED-P NIL :COUNT 10000) 229
 (7FCED7FF0B40) : 2 (CHEAP-EVAL (CCL:PRINT-CALL-HISTORY :DETAILED-P NIL :COUNT *CCL-PRINT-CALL-HISTORY-COUNT*)) 101
 (7FCED7FF0B78) : 3 (PRINT-CALL-HISTORY) 229
 (7FCED7FF0B88) : 4 (OUR-ABORT #<CCL::NOT-LOCK-OWNER #x3021D590335D> OUR-ABORT) 189
 (7FCED7FF0BB8) : 5 (BREAK-LOOP-HANDLE-ERROR #<CCL::NOT-LOCK-OWNER #x3021D590335D> 17565795475856) 701
 (7FCED7FF0C58) : 6 (%ERROR #<CCL::NOT-LOCK-OWNER #x3021D590335D> (:LOCK #<RECURSIVE-LOCK "Kernel exception-lock" [ptr @ #x16DFC40] #x30200001146D>) 17565795475856) 365
 (7FCED7FF0C80) : 7 (%UNLOCK-RECURSIVE-LOCK-OBJECT #<RECURSIVE-LOCK "Kernel exception-lock" [ptr @ #x16DFC40] #x30200001146D>) 405
 (7FCED7FF0CB0) : 8 (STATIC-CONS 102 ((SV::CONCAT 104 # #))) 181
              [...]
 (7FCED7FFEA18) : 490 (LP) 7709
 (7FCED7FFEA80) : 491 (CALL-CHECK-REGS LP) 229
 (7FCED7FFEAB8) : 492 (TOPLEVEL-EVAL (LP) NIL) 789
 (7FCED7FFEB30) : 493 (READ-LOOP :INPUT-STREAM #<SYNONYM-STREAM to *TERMINAL-IO* #x3020033EC7CD> :OUTPUT-STREAM #<SYNONYM-STREAM to *TERMINAL-IO* #x3020033EC66D> :BREAK-LEVEL 0 :PROMPT-FUNCTION #<Compiled-function (:INTERNAL CCL::READ-LOOP) (Non-Global)  #x300000590B8F>) 2509
 (7FCED7FFED78) : 494 (RUN-READ-LOOP :BREAK-LEVEL 0) 157
 (7FCED7FFEDA0) : 495 (TOPLEVEL-LOOP) 93
 (7FCED7FFEDB0) : 496 (FUNCALL #'#<(:INTERNAL (CCL:TOPLEVEL-FUNCTION (CCL::LISP-DEVELOPMENT-SYSTEM T)))>) 109
 (7FCED7FFEDD0) : 497 (FUNCALL #'#<(:INTERNAL CCL::MAKE-MCL-LISTENER-PROCESS)>) 661
 (7FCED7FFEE68) : 498 (RUN-PROCESS-INITIAL-FORM #<TTY-LISTENER listener(1) [Active] #x302000452AED> (#<CCL:COMPILED-LEXICAL-CLOSURE # #x3020033EB2DF>)) 813
 (7FCED7FFEEF0) : 499 (FUNCALL #'#<(:INTERNAL (CCL::%PROCESS-PRESET-INTERNAL (CCL:PROCESS)))> #<TTY-LISTENER listener(1) [Active] #x302000452AED> (#<CCL:COMPILED-LEXICAL-CLOSURE # #x3020033EB2DF>)) 581
 (7FCED7FFEF98) : 500 (FUNCALL #'#<(:INTERNAL CCL::THREAD-MAKE-STARTUP-FUNCTION)>) 277
***********************************************
************ ABORTING from raw Lisp ***********
Error:  Current process #<TTY-LISTENER listener(1) [Active] #x302000452AED> does not own lock #<RECURSIVE-LOCK "Kernel exception-lock" [ptr @ #x16DFC40] #x30200001146D>
NOTE: See above for backtrace.
***********************************************

I'm afraid I'm at a loss for how to reproduce this error as it occurred after more than a week of runtime in an ACL2 program. 63 other almost identical runs of the program executing simultaneously did not result in this error, so you could even say that this appeared after "more than a year" of CPU time... Hopefully the problem is obvious at a glance to someone sufficient understanding of CCL, but if not, please let me know if you have any ideas about how to isolate it. Thanks!

kini commented 7 years ago

I should also mention, as I did on IRC, that I was using a single threaded version of ACL2 (though I understand that CCL uses multiple threads under the hood even if there's only one user thread), and also that the contents of *features* is as follows:

? *features*
(:PRIMARY-CLASSES :COMMON-LISP :OPENMCL :CCL :CCL-1.2 :CCL-1.3 :CCL-1.4 :CCL-1.5 :CCL-1.6 :CCL-1.7 :CCL-1.8 :CCL-1.9 :CCL-1.10 :CCL-1.11 :CCL-1.12 :CLOZURE :CLOZURE-COMMON-LISP :ANSI-CL :UNIX :OPENMCL-UNICODE-STRINGS :IPV6 :OPENMCL-NATIVE-THREADS :OPENMCL-PARTIAL-MOP :MCL-COMMON-MOP-SUBSET :OPENMCL-MOP-2 :OPENMCL-PRIVATE-HASH-TABLES :STATIC-CONSES-SHOULD-WORK-WITH-EGC-IN-CCL :X86-64 :X86_64 :X86-TARGET :X86-HOST :X8664-TARGET :X8664-HOST :LINUX-HOST :LINUX-TARGET :LINUXX86-TARGET :LINUXX8664-TARGET :LINUXX8664-HOST :64-BIT-TARGET :64-BIT-HOST :LINUX :LITTLE-ENDIAN-TARGET :LITTLE-ENDIAN-HOST)
3b commented 7 years ago

Not sure if it is the specific cause of this problem, but %lock-recursive-lock-ptr looks like it has a race between loading owner (outside without-interrupts) and checking to see if it matches current thread (inside without-interrupts) that seems like it could cause that error.