Open ambv opened 1 year ago
Hey @ambv, thanks for the suggestion. For pymonome and especially aiosc I would rather avoid having heavyweight external dependencies though.
Does it not work for you if you initialize GridBuffer once (e.g. in on_grid_ready callback)?
Oh, I see. You problem is allocating the OSC message itself, sorry. I've never experienced any delays in refreshing grid at 30 fps myself (I have a 256), could you share the sample code that exposes the issue?
For pymonome and especially aiosc I would rather avoid having heavyweight external dependencies though.
I understand you don't want to add numpy
to your requirements, and you don't have to. As you can see above, I'm specifically checking for an numpy.ndarray type without ever importing numpy. You don't have to have numpy as a runtime dependency to support this.
Does it not work for you if you initialize GridBuffer once (e.g. in on_grid_ready callback)?
Not really because the .levels
property on the buffer can only be cleared by a Python-level quadratic loop (led_level_all
). Plus .levels
is written in terms of regular lists of lists, which are rather poor for cache locality. Numpy arrays are faster in those two cases, but also in turning the array into a bytestream (that bit happens in aiosc
).
Not really because the .levels property on the buffer can only be cleared by a Python-level quadratic loop (led_level_all). Plus .levels is written in terms of regular lists of lists, which are rather poor for cache locality. Numpy arrays are faster in those two cases, but also in turning the array into a bytestream (that bit happens in aiosc).
If the buffer would be a 1-dimensional list would that improve performance without introducing an optional numpy dependency?
If you really don't want to make aiosc
support numpy
, you could make it support Python's built-in array.array
instead. Compared to numpy
we lose:
We'd still maintain:
arr.tobytes()
(although OSC is big-endian so we'd have to first call arr.byteswap()
so that's slower than numpy's version)I think allowing numpy arrays would be better because numpy's it's the fastest option and it's got a lot of mindshare in Python. The changes to aiosc
are all in the original message above (no import numpy
) and the changes to monome.py
is just two new methods.
Making Buffer instances work with array.array
internally would also be an efficiency win, although a more modest one.
Making Buffer instances use single-dimensional list
objects won't be very effective because those are still arrays of pointers to PyObjects.
Here's my testing code: https://gist.github.com/ambv/e422fc092d3ac3e79e8b53f8efcb3108
The latest revision uses numpy. The previous revision uses pure Python data routines that created buffers with every frame. The numpy version is butter smooth on Python 3.11.1. The previous revision has occasional stutters of a few frames.
I can't see it on the Grid, but it's more obvious to me on the Arc because the pixels are much closer together and so smooth movement look like solid motion while stuttery movement breaks the illusion.
Yep, array.array is a cool idea. I'd happily pull the change to the repo. As for numpy arrays - is it a viable option for for your use-case to convert numpy array to bytes at the application level?
Yes, if you supported passing bytes directly to aiosc
, that would be fine by me, then I'd do the entire numpy dance on my end.
bytes
class is currently mapped to OSC blob type in aiosc, however passing bytes to send
with an asterisk should repack them as integers one by one - is the resulting performance good enough to have it this way?
I haven't tested that. It just seems wasteful to me because I'd be unpacking the bytes unnecessarily on one end only to repack them again one-by-one on the other end.
Right, but also bytes->blob is a pretty straightforward type mapping in the current send
calling convention and having it any other way (bytes -> integer array) feels a bit less intuitive.
arrays (numpy or lists) can be converted to OSC in more than one way on the other hand, and your initial proposal is about having send
convert a specific type in a specific way. Would that require a different conversion logic for any other case? I think a good candidate for resolving this would rather be the application (or at least the pymonome) layer, but happy to stand corrected in case i'm missing something obvious (it's past midnight here, sorry :))
It should be also possible to work around the conversion entirely by generating raw OSC completely in your drawing methods and using grid.transport.sendto
to send it to the grid in case you need extreme optimization.
There is no hurry in developing the API well here. I personally care about making things efficient by default so that pymonome
code can be used for tricky use cases as well. In my case, I'm doing MIDI and even realtime audio processing and the Grid + Arc are controllers. I'd rather only spend as little CPU time on that task as possible.
Sure thing, I care about efficiency as well. But I'd argue that pymonome is quite efficient for practical usage already :) My live setup is also a grid + arc (sometimes 2 grids + arc) with a lot of bidirectional OSC traffic going between serialosc, the controller/sequencer scripts and the audio engine. There's a lot of grid redrawing going on constantly with several off-screen (off-grid) buffers. Not so much animations on the arc, but the rings react very well to user input.
There's a noticeable CPU overhead in the entire controller app since I've switched to asyncio, likely related to a lot of i/o polling happening under the hood, but I'm getting it with or without rendering. I also rarely (if ever) do zeroing of the LED buffers. Instead, the rendering loop body typically fills the entire buffer in most of my scripts.
That said, extra optimization would never hurt. With the solution above I have some doubts though - it's a bit hackish, and it's a really neat hack! But it caters to a rather specific way of concatenating a numpy array to the rest of the arguments (breaking the consistency of 1:1 mapping of send
arguments to OSC arguments) and it adds a dependency on numpy to function properly in the client code, despite not requiring the module. Is this where practicality beats purity? Being a numpy user myself, i'm not 100% sure :)
I'm glad to discuss this or other approaches further to solve the original issue of course.
FWIW, here's the output of your test script I'm getting with regular Python lists and uvloop, visibly things are smooth:
FPS: computation=95.20 (min=85.60) rendering=31.80
FPS: computation=95.14 (min=84.15) rendering=31.82
FPS: computation=95.01 (min=89.94) rendering=31.73
FPS: computation=95.14 (min=84.85) rendering=31.64
FPS: computation=94.72 (min=84.60) rendering=31.87
FPS: computation=95.41 (min=85.74) rendering=31.83
FPS: computation=95.30 (min=88.92) rendering=31.78
Thanks for the awesome asyncio-first library. Works really well.
I was able to get Grid + Arc consistently output at 30fps with ~90fps logic updates without stuttering. To achieve this, I had to switch from using
GridBuffer
andArcBuffer
which allocate the entire map every frame. Instead, I used a numpy array that I simply zeroed between frames, which is measurably faster.To make this work, I had to make two changes:
Add
grid.led_level_map_raw
andarc.ring_map_raw
methods that don't unpack*data
but simply passdata
directly to OSC.In
aiosc
, I added support for numpy arrays inpack_message
without importing numpy by doing this:With those changes, numpy arrays passed to the new methods allow reusing the same memory buffer and are very efficiently translated to bytes entirely in C land (with
arg.tobytes()
above).Would you be interested in pull requests to upstream my changes?