ahyatt / emacs-websocket

A websocket implementation in elisp, for emacs.
GNU General Public License v2.0
324 stars 42 forks source link

Speed feedback #57

Closed chrisdone closed 5 years ago

chrisdone commented 5 years ago

A personal observation: when sending a payload of a PNG image from a web browser on localhost to the emacs-websocket server, the websocket-read-frame function takes 452ms to consume it.

websocket-read-frame: 0ms [29 times]
websocket-read-frame: 452ms

@ahyatt Are you aware of some obvious low-hanging fruit in the performance? It seems like O(n) time, perhaps it could be mitigated?

For my use-case I'm probably going to switch to making a POST request to Emacs and use a raw socket to consume the binary chunks. Theoretically it should take maybe a millisecond to receive 100KB. It seems like the websocket protocol is a bit of a pain to implement.

ahyatt commented 5 years ago

I haven't looked at performance at all. Half a second to consume a PNG does excessive. Was the websocket library compiled?

chrisdone commented 5 years ago

The result above is byte-compiled. Before that, it was around 700ms+. So byte-compiling brought it down to a reproducible 400ms~.

ahyatt commented 5 years ago

Can you check out the performance branch I recently created and try again? My changes should result in a substantial speedup (they did on my test).

chrisdone commented 5 years ago

Sure, I still have the code for my test.

chrisdone commented 5 years ago

That looks like a 10x improvement! From 400ms best case to about 40ms best case. Nice work! :ok_hand:

Before:

    websocket-read-frame: 0ms [30 times]
    websocket-read-frame: 637ms
    websocket-read-frame: 0ms
    create-image: 0ms
    insert-image & redisplay: 5ms

    websocket-read-frame: 0ms [30 times]
    websocket-read-frame: 402ms
    websocket-read-frame: 0ms
    create-image: 0ms
    insert-image & redisplay: 4ms

    websocket-read-frame: 0ms [11 times]
    websocket-read-frame: 258ms
    websocket-read-frame: 0ms [18 times]
    websocket-read-frame: 423ms
    websocket-read-frame: 0ms
    create-image: 0ms
    insert-image & redisplay: 3ms

    websocket-read-frame: 0ms [24 times]
    websocket-read-frame: 231ms
    websocket-read-frame: 0ms [5 times]
    websocket-read-frame: 395ms
    websocket-read-frame: 0ms
    create-image: 0ms
    insert-image & redisplay: 11ms

    websocket-read-frame: 0ms [30 times]
    websocket-read-frame: 408ms
    websocket-read-frame: 0ms
    create-image: 0ms
    insert-image & redisplay: 5ms

after

websocket-read-frame: 0ms [10 times]
websocket-read-frame: 244ms
websocket-read-frame: 0ms [19 times]
websocket-read-frame: 40ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms

websocket-read-frame: 0ms [30 times]
websocket-read-frame: 38ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms

websocket-read-frame: 0ms [30 times]
websocket-read-frame: 43ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms

websocket-read-frame: 0ms [30 times]
websocket-read-frame: 40ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms
chrisdone commented 5 years ago

I wrote a simple http server to receive the bytes directly as a POST request:

Time to receive as a simple POST binary request:

Non-byte-compiled:

image-server: receive time: 31ms
image-server: receive time: 27ms
image-server: receive time: 32ms
image-server: receive time: 27ms
image-server: receive time: 22ms

Byte-compiled, not so different:

image-server: receive time: 18ms
image-server: receive time: 24ms
image-server: receive time: 21ms
image-server: receive time: 25ms
image-server: receive time: 22ms
image-server: receive time: 23ms
chrisdone commented 5 years ago

My websocket code is like:

(progn
  (websocket-server-close server)
  (defvar server
    (websocket-server
     9009
     :host 'local
     :on-message
     (lambda (ws frame)
       (with-current-buffer (websocket-client-data ws)
         (insert (websocket-frame-payload frame))
         (when (websocket-frame-completep frame)
           (let ((bytes (buffer-string)))
             (with-current-buffer (get-buffer-create "*image-output*")
               (let ((img (measure-time "create-image" (create-image bytes 'jpeg t))))
                 (measure-time "insert-image & redisplay"
                               (progn (insert-image img)
                                      (redisplay)))))
             (erase-buffer))))
       )
     :on-open
     (lambda (ws) (message "Client connected.")
       (setf (websocket-client-data ws)
             (with-current-buffer
                 (get-buffer-create (generate-new-buffer "websocket"))
               (toggle-enable-multibyte-characters -1)
               (current-buffer))))
     :on-close
     (lambda (ws)
       (kill-buffer (websocket-client-data ws))
       (message "Client disconnected.")))))
chrisdone commented 5 years ago

I looked into a little bit how I'm buffering the strings.

(benchmark-run-compiled 100
  (length
   (with-temp-buffer
     (cl-loop for i from 0 to 10000
              do (insert "hello world hello worldhello worldhello worldhello worldhello world"))
     (buffer-string))))

(benchmark-run-compiled 100
  (length
   (let ((buffer (list)))
     (cl-loop for i from 0 to 10000
              do (setq buffer (cons "hello world hello worldhello worldhello worldhello worldhello world" buffer)))
     (mapconcat 'identity (reverse buffer) ""))))

(benchmark-run-compiled 100
  (length
   (let* ((buffer (list ""))
          (pointer buffer))
     (cl-loop for i from 0 to 10000
              do (setq
                  pointer
                  (setcdr pointer
                          (list "hello world hello worldhello worldhello worldhello worldhello world"))))
     (mapconcat 'identity buffer ""))))

(the length is so I can easily check they all have the same length)

Results

(1.641801722 4 0.908923355999832)
(1.631438268 4 0.9008467079997899)
(1.616854798 4 0.9054001909998988)

(1.331959202 5 1.1342625530000987)
(1.322703598 5 1.119752232999872)
(1.320611183 5 1.1191802010002903)

(1.101568966 4 0.9080151119997026)
(1.085339208 4 0.9026358060000348)
(1.095221923 4 0.9009243840000636)

It seems you can shave off 30% of the time by using a list. Between consing+reverse vs setcdr, it just seems that the latter shaves off a bit more.

ahyatt commented 5 years ago

This is good data, thanks for sharing it.

I'm happy the speedup from the performance changes is substantial. It turns out the key to doing a great job on performance is developing the initial version without paying any attention to it :)

I'll go ahead and merge the performance changes in and create new version. I can continue to iterate on performance, but elisp is just not that fast, so I think it will be hard to be equivalent to non-elisp code.

chrisdone commented 5 years ago

Thanks!