Closed chrisdone closed 5 years ago
I haven't looked at performance at all. Half a second to consume a PNG does excessive. Was the websocket library compiled?
The result above is byte-compiled. Before that, it was around 700ms+. So byte-compiling brought it down to a reproducible 400ms~.
Can you check out the performance branch I recently created and try again? My changes should result in a substantial speedup (they did on my test).
Sure, I still have the code for my test.
That looks like a 10x improvement! From 400ms
best case to about 40ms
best case. Nice work! :ok_hand:
Before:
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 637ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 5ms
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 402ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 4ms
websocket-read-frame: 0ms [11 times]
websocket-read-frame: 258ms
websocket-read-frame: 0ms [18 times]
websocket-read-frame: 423ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 3ms
websocket-read-frame: 0ms [24 times]
websocket-read-frame: 231ms
websocket-read-frame: 0ms [5 times]
websocket-read-frame: 395ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 11ms
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 408ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 5ms
after
websocket-read-frame: 0ms [10 times]
websocket-read-frame: 244ms
websocket-read-frame: 0ms [19 times]
websocket-read-frame: 40ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 38ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 43ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms
websocket-read-frame: 0ms [30 times]
websocket-read-frame: 40ms
websocket-read-frame: 0ms
create-image: 0ms
insert-image & redisplay: 1ms
I wrote a simple http server to receive the bytes directly as a POST request:
Time to receive as a simple POST binary request:
Non-byte-compiled:
image-server: receive time: 31ms
image-server: receive time: 27ms
image-server: receive time: 32ms
image-server: receive time: 27ms
image-server: receive time: 22ms
Byte-compiled, not so different:
image-server: receive time: 18ms
image-server: receive time: 24ms
image-server: receive time: 21ms
image-server: receive time: 25ms
image-server: receive time: 22ms
image-server: receive time: 23ms
My websocket code is like:
(progn
(websocket-server-close server)
(defvar server
(websocket-server
9009
:host 'local
:on-message
(lambda (ws frame)
(with-current-buffer (websocket-client-data ws)
(insert (websocket-frame-payload frame))
(when (websocket-frame-completep frame)
(let ((bytes (buffer-string)))
(with-current-buffer (get-buffer-create "*image-output*")
(let ((img (measure-time "create-image" (create-image bytes 'jpeg t))))
(measure-time "insert-image & redisplay"
(progn (insert-image img)
(redisplay)))))
(erase-buffer))))
)
:on-open
(lambda (ws) (message "Client connected.")
(setf (websocket-client-data ws)
(with-current-buffer
(get-buffer-create (generate-new-buffer "websocket"))
(toggle-enable-multibyte-characters -1)
(current-buffer))))
:on-close
(lambda (ws)
(kill-buffer (websocket-client-data ws))
(message "Client disconnected.")))))
I looked into a little bit how I'm buffering the strings.
(benchmark-run-compiled 100
(length
(with-temp-buffer
(cl-loop for i from 0 to 10000
do (insert "hello world hello worldhello worldhello worldhello worldhello world"))
(buffer-string))))
(benchmark-run-compiled 100
(length
(let ((buffer (list)))
(cl-loop for i from 0 to 10000
do (setq buffer (cons "hello world hello worldhello worldhello worldhello worldhello world" buffer)))
(mapconcat 'identity (reverse buffer) ""))))
(benchmark-run-compiled 100
(length
(let* ((buffer (list ""))
(pointer buffer))
(cl-loop for i from 0 to 10000
do (setq
pointer
(setcdr pointer
(list "hello world hello worldhello worldhello worldhello worldhello world"))))
(mapconcat 'identity buffer ""))))
(the length
is so I can easily check they all have the same length)
Results
(1.641801722 4 0.908923355999832)
(1.631438268 4 0.9008467079997899)
(1.616854798 4 0.9054001909998988)
(1.331959202 5 1.1342625530000987)
(1.322703598 5 1.119752232999872)
(1.320611183 5 1.1191802010002903)
(1.101568966 4 0.9080151119997026)
(1.085339208 4 0.9026358060000348)
(1.095221923 4 0.9009243840000636)
It seems you can shave off 30% of the time by using a list. Between consing+reverse vs setcdr, it just seems that the latter shaves off a bit more.
This is good data, thanks for sharing it.
I'm happy the speedup from the performance changes is substantial. It turns out the key to doing a great job on performance is developing the initial version without paying any attention to it :)
I'll go ahead and merge the performance changes in and create new version. I can continue to iterate on performance, but elisp is just not that fast, so I think it will be hard to be equivalent to non-elisp code.
Thanks!
A personal observation: when sending a payload of a PNG image from a web browser on localhost to the emacs-websocket server, the
websocket-read-frame
function takes 452ms to consume it.@ahyatt Are you aware of some obvious low-hanging fruit in the performance? It seems like
O(n)
time, perhaps it could be mitigated?For my use-case I'm probably going to switch to making a POST request to Emacs and use a raw socket to consume the binary chunks. Theoretically it should take maybe a millisecond to receive 100KB. It seems like the websocket protocol is a bit of a pain to implement.