Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
841 stars 105 forks source link

Incorrect simple-array access with (declaim (optimize (speed 3))) #446

Closed MattKaufmann closed 1 year ago

MattKaufmann commented 1 year ago

The example below ends with a form that gives an incorrect array value on my MacBook Pro. I extracted it from an example supplied by Warren A. Hunt, Jr. and Yahya Sohail that runs in ACL2, but the example below runs in pure CCL. After seeing the problem with CCL build from git version 01cf6cd7f849d75bc61eeb3f25b034500da6b58b, I rebuilt CCL from the latest version, ff51228259d9dbc8a9cc7bbb08858ef4aa9fe8d0, and got the same (unfortunate) result.

(declaim (optimize (speed 3)))

(defun arr-fieldi (i ar)
  (aref (the (simple-array (signed-byte 64) (10))
             ar)
        i))

(defun update-arr-fieldi (i v ar)
  (progn (setf (aref ar i) v)
         ar))

(defvar my-ar (make-array 10
                          :element-type '(signed-byte 64)
                          :initial-element '0))

; True:
(typep my-ar 'simple-array)

(update-arr-fieldi 0 #x-7FFFFFFE47AFEF96 my-ar)

; The following returns 537, not the expected value, #x-7FFFFFFE47AFEF96.
(arr-fieldi 0 my-ar)
yaso9 commented 1 year ago

The issue seems to only occur with the 64-bit version of CCL. Using the 32-bit version yields the correct result. I tried using git bisect to find the commit that introduced the bug, but I had trouble compiling many of the older versions of CCL (it kept getting stuck on compiling compiler/optimizers.lisp).

MattKaufmann commented 1 year ago

I've discovered that CCL 16365, apparently from 2015, doesn't have the bug. I seem to have found versions from 2017 that do have the bug.

svspire commented 1 year ago

Changing first line to (declaim (optimize (speed 3) (safety 3))) fixes it for me. That might be a workaround until we can figure out the real solution.

MattKaufmann commented 1 year ago

@svspire Thank you for the reply! Unfortunately, that seems to slow things down by two orders of magnitude; see attached example. That example uses the default optimization settings that ACL2 uses. I tried some other fixes that didn't work, but one that did: changing the accessor from (the (simple-array (signed-byte 64) (10)) ar) to (the (simple-array * (10)) ar). That one adds 37% to the time in my test, so still not great; but I guess it will have to do until CCL is fixed. (Sorry about the ".txt" extension on the attached file; I couldn't seem to attach it as a ".lsp" file.) ccl-array-bug.txt

MattKaufmann commented 1 year ago

I have attached an example showing that the problem occurs at the fixnum boundary. I wonder if CCL is mistakenly assuming here that a (signed-byte 64) is a fixnum. ccl-array-bug-2.txt

MattKaufmann commented 1 year ago

One more clue (and then I may stop): The bug seems to require that the integer type contains a negative number; see attached. ccl-array-bug-3.txt

xrme commented 1 year ago

Suspiciously, the number 537 (#x219) is a two-digit bignum header. That would seem to indicate that something in the process of turning an element of a (signed-byte 64) vector into an integer is getting mixed up.

svspire commented 1 year ago

Aha! If you disassemble arr-fieldi compiled with (safety 0), you'll see (movl ($ #x219) (% imm0.l)) I knew that was where the 537 came from, but I couldn't figure out what that constant was doing there.

gmpalter commented 1 year ago

It also fails with positive values as long as they're too big to fit in a fixnum.

The generated code properly creates the 2-digit bignum to hold the result and stores it in the return value. But, the code then falls through to the code which creates a new signed-byte-64 from whatever's in %imm0, which is the 2-digit bignum header. The culprit is x862-box-s64 but I don't know how to fix it yet.

gmpalter commented 1 year ago

Found it and just checked in the fix.

MattKaufmann commented 1 year ago

This is great! Is there an easy way to check programmatically for whether CCL has this fix, other than to run an example that formerly gave an incorrect result?

gmpalter commented 1 year ago

I can't think of one as it was a compiler bug. (There's a test for it now in the test suite at the end of ccl.lsp.)

MattKaufmann commented 1 year ago

That's very helpful actually. Thank you again (and all those who helped) for the fix!