alex-gutev / static-dispatch

Static generic function dispatch for Common Lisp
MIT License
80 stars 1 forks source link

Help needed: 3 different performance timings with same code #9

Open mfiano opened 3 years ago

mfiano commented 3 years ago

I am experiencing 3 different performance results consistently with the same code, depending on how I load it. Here is the test code:

(in-package #:cl-user)

(defpackage #:origin2
  (:use #:static-dispatch-cl))

(in-package #:origin2)

(defstruct (aggregate
            (:predicate nil)
            (:copier nil))
  (storage (make-array 0 :element-type 'single-float :initial-element 0.0) :type (simple-array single-float (*))))

(defgeneric ref (sequence index)
  (:method ((sequence aggregate) index)
    (elt (aggregate-storage sequence) index)))

(defun test ()
  (declare (optimize (speed 3)))
  (let ((a (make-aggregate :storage (make-array 3 :element-type 'single-float :initial-contents '(1.0 2.0 3.0)))))
    (declare (type aggregate a))
    (dotimes (i (expt 10 9))
      (ref a 2))))

Clearing FASL's, loading this code into my image on SBCL 2.1.7, and issuing:

(progn
  (sb-ext:gc :full t)
  (time (test)))

I get these results consistently:

Evaluation took:
  7.146 seconds of real time
  7.130177 seconds of total run time (7.130177 user, 0.000000 system)
  99.78% CPU
  20 lambdas converted
  14,232,702,075 processor cycles
  1,080,336 bytes consed

If I now re-compile the #'test function in the same image, and re-run the test I consistently get:

Evaluation took:
  0.350 seconds of real time
  0.351234 seconds of total run time (0.351196 user, 0.000038 system)
  100.29% CPU
  701,597,093 processor cycles
  0 bytes consed

If I wrap the defgeneric in (eval-when (:compile-toplevel :load-toplevel :execute) ...), clear FASL's, loading the code, and re-run the test, I consistently get:

Evaluation took:
  0.583 seconds of real time
  0.579642 seconds of total run time (0.579642 user, 0.000000 system)
  99.49% CPU
  1,157,429,524 processor cycles
  0 bytes consed

I get no warnings from static-dispatch in all 3 cases, and everything compiles cleanly. I would like to know why these differences are occurring, and how I would get the results of the second test above from a clean image (instead of manually recompiling the function after loading). Thank you for your help.

scymtym commented 3 years ago

I think the slowness in the first case can be explained by noting the time at which the effects of the defgeneric expansion happen: the ref call in test can be optimized only during compilation of test so only compile-time effects of the defgeneric expansion have already happened when test is compiled (in the first case -- in the other cases, all effects of the defgeneric expansion have already happened when test is compiled).

The defgeneric expansion contains eval-when forms to address the issue, but those only work at the top-level. let and when make their sub-forms non-top-level. This effect can be seen by changing make-static-dispatch to emit

(eval-when (:compile-toplevel :load-toplevel :execute)
       (print "should be at compile-time")
           (unless (sb-c::info :function :info ',name)
             (sb-c:defknown ,name * * nil :overwrite-fndb-silently t)))

Compiling and loading @mfiano 's example via (load (compile-file "/tmp/dispatch.lisp")) then results in

; compiling file "/tmp/dispatch.lisp" (written 08 AUG 2021 05:43:37 PM):
…
; processing (DEFSTRUCT (AGGREGATE # ...) ...)
; processing (DEFGENERIC REF ...)
; processing (DEFUN TEST ...)

; wrote /tmp/dispatch.fasl
; compilation finished in 0:00:00.036

"should be at compile-time" 
T

That is, the effect happens at load-time, not compile-time.

alex-gutev commented 3 years ago

To avoid these difference you should define the generic functions and their methods in a separate file from the code which uses them. As already mentioned by @scymtym, what's happening here is that since they are contained in the same file, the deftransform forms emitted by static-dispatch, which are responsible for the actual inlining, only take effect after the entire file is compiled, that is after the test function, which calls the generic function, is compiled. As a result no inlining takes place in test. On other implementations, where compiler-macros are used, you will get a warning however on SBCL, where deftransform is used there is no warning since the transform isn't even registered yet, and hence is never called.

I don't know what's causing the difference between the second and third case. There shouldn't be any difference unless SBCL is performing more optimizations when the test function is recompiled. I will look into that further.