SoA support would be an important feature for symusic, since it is more suitable for AI applications than the current AoS (Array of Struct) interface.
SoA interface could enable lots of flexible conversion, resampling, quantization and other operations. #10
These functions take full advantage of numpy's eco (like numba), and could be very fast.
It's also important that we don't need to introduce more time_unit types (like beat) in symusic, which would make the general purpose code in c++ part more and more complex.
However, the interface for SoA is still to be determined. I'd like to hear your advice @Natooz @ilya16 . And of course, other design options are welcome!
Here, I will list some possible options I could think of.
Option 1: Dict of Numpy Array
In this case, we won't introduce new classes in symusic, but only use dict and numpy.ndarray.
from symusic import Score, Note, ControlChange
s = Score(...)
# get the soa of controls
# because "controls" is not a python list, but a c++ vector
# we could bind a .numpy() method for it
controls_arr: Dict[str, np.ndarray] = s.controls.numpy()
notes_arr: Dict[str, np.ndarray] = s.tracks[0].notes.numpy()
# create traditional AoS from numpy array
# Here, we could utilize the existing factory class for events like notes
s.controls = ControlChange.from_numpy(controls_arr['time'], controls_arr['number'], controls_arr['value'])
s.tracks[0].notes = Note.from_numpy(notes_arr['time'], notes_arr['duration'], notes_arr['pitch'], notes_arr['velocity'])
# or we could use ** to shorten this
s.controls = ControlChange.from_numpy(**controls_arr)
s.tracks[0].notes = Note.from_numpy(**notes_arr)
Option 2: New SoA Classes in C++
In this case, we will define new classes for SoA, and use them in symusic. It seems more object-oriented.
The problem is, they will be defined 3 times, because of time unit. (The same reason for NoteTick, NoteQuarter and NoteSecond)
We will define a Union for them in symusic.types
from symusic import Score
import symusic.types as smt
# get the soa of controls and notes
controls_arr: smt.ControlChangeArr = s.controls.numpy()
notes_arr: smt.NoteArr = s.tracks[0].notes.numpy()
# convert them back to AoS
s.controls = controls_arr.list()
s.tracks[0].notes = notes_arr.list()
Also, although we have switched to nanobind, which get a much smaller overhead on accessing class attributes, the overhead is still there. Note that those overhead are almost constant, so it's not a problem for those "ms level" functions.
So if not necessary, I would not recommend create new class in c++. (Well, this overhead should be considered more in AoS part, not the SoA part)
Here is a benchmark for those tiny operations.
lib
Create a Note
Access Note.pitch
Note.pitch += & -=
python dict
66 ns
17 ns
69.9 ns
miditoolkit
162 ns
15.2 ns
48.1 ns
NamedTuple
175 ns
17.4 ns
tuple is const
symusic[nanobind]
251 ns
27.8 ns
110 ns
symusic[pybind11]
791 ns
238 ns
1070 ns
nb.jitclass in py
5.6 µs
37.8 ns
656 ns
Option 3: New SoA Classes in Python
In this case, we define the new class in python. It is more flexible, python native (no overhead).
But, these class can't be called in c++ (At least I don't know how to achieve this now. Maybe it's possible).
So, we won't get the .numpy() function here.
from symusic import Score, NoteArr, ControlChangeArr
controls_arr = ControlChangeArr(s.controls)
notes_arr = NoteArr(s.tracks[0].notes)
# convert them back to AoS
s.controls = controls_arr.list()
s.tracks[0].notes = notes_arr.list()
SoA support would be an important feature for
symusic
, since it is more suitable for AI applications than the current AoS (Array of Struct) interface.SoA interface could enable lots of flexible conversion, resampling, quantization and other operations. #10 These functions take full advantage of numpy's eco (like numba), and could be very fast.
It's also important that we don't need to introduce more time_unit types (like beat) in symusic, which would make the general purpose code in c++ part more and more complex.
However, the interface for SoA is still to be determined. I'd like to hear your advice @Natooz @ilya16 . And of course, other design options are welcome!
Here, I will list some possible options I could think of.
Option 1: Dict of Numpy Array
In this case, we won't introduce new classes in
symusic
, but only usedict
andnumpy.ndarray
.Option 2: New SoA Classes in C++
In this case, we will define new classes for SoA, and use them in
symusic
. It seems more object-oriented.The problem is, they will be defined 3 times, because of time unit. (The same reason for NoteTick, NoteQuarter and NoteSecond) We will define a Union for them in
symusic.types
Also, although we have switched to
nanobind
, which get a much smaller overhead on accessing class attributes, the overhead is still there. Note that those overhead are almost constant, so it's not a problem for those "ms level" functions.So if not necessary, I would not recommend create new class in c++. (Well, this overhead should be considered more in AoS part, not the SoA part)
Here is a benchmark for those tiny operations.
Option 3: New SoA Classes in Python
In this case, we define the new class in python. It is more flexible, python native (no overhead).
But, these class can't be called in c++ (At least I don't know how to achieve this now. Maybe it's possible). So, we won't get the
.numpy()
function here.