The current realtime audio ingestion system imports interleaved data from PortAudio, meaning that each sample frame is assumed to be 2 floats (stereo-only for now) and is uploaded to the GPU as packed 2D vectors in a single image of one frame of samples wide and 1 pixel tall.
This is supposedly fast in that the CPU only has to copy the data out of the buffer and into GPU memory without having to do any per-sample manipulation. But it is inflexible in that certain channel counts won't work well across all GPUs. For instance the Vulkan Hardware Database shows that signed 32-bit floats are broadly supported at 100% for singles, doubles, and quads, but support for sampling from triples is out for almost three quarters of the hardware supported. This means that at most we could build a system that can ingest 1, 2, or 4 channels of audio only. Scaling beyond 4 channels would require uploading a separate texture image.
Furthermore it seems from this proposal on portaudio that interleaved data may not always be the way the underlying hardware is providing the data to portaudio, so the library may be interleaving the data manually.
An alternative would be to upload the samples de-interleaved as a series of single floats in an image that is 1 frame of samples wide and an arbitrary number of channels tall.
As Scintillator is a video synth it's arguable that audio import, for visualization, doesn't need to be as sophisticated or flexible as SuperCollider. But it's also arguable that Scintillator should be able to consume and do something useful with any audio data that SuperCollider is capable of producing. And it is certainly the case that SuperCollider can produce very high channel count audio output. So it follows that Scintillator should also be able to handle these as inputs.
It might be best to expose to the log what the native API on the other side of PortAudio is providing, or is capable of providing, and offer the Scintillator user the option of requesting either interleaved audio with a fixed channel support or de-interleaved audio with an arbitrary number of channels. Perhaps the system by default could choose the one requiring the least processing power. Or perhaps ffmpeg audio decode and audio output will obviate the choice.
This is probably also worth waiting for some user feedback on, so opinions welcome here!
The current realtime audio ingestion system imports interleaved data from PortAudio, meaning that each sample frame is assumed to be 2 floats (stereo-only for now) and is uploaded to the GPU as packed 2D vectors in a single image of one frame of samples wide and 1 pixel tall.
This is supposedly fast in that the CPU only has to copy the data out of the buffer and into GPU memory without having to do any per-sample manipulation. But it is inflexible in that certain channel counts won't work well across all GPUs. For instance the Vulkan Hardware Database shows that signed 32-bit floats are broadly supported at 100% for singles, doubles, and quads, but support for sampling from triples is out for almost three quarters of the hardware supported. This means that at most we could build a system that can ingest 1, 2, or 4 channels of audio only. Scaling beyond 4 channels would require uploading a separate texture image.
Furthermore it seems from this proposal on portaudio that interleaved data may not always be the way the underlying hardware is providing the data to portaudio, so the library may be interleaving the data manually.
An alternative would be to upload the samples de-interleaved as a series of single floats in an image that is 1 frame of samples wide and an arbitrary number of channels tall.
As Scintillator is a video synth it's arguable that audio import, for visualization, doesn't need to be as sophisticated or flexible as SuperCollider. But it's also arguable that Scintillator should be able to consume and do something useful with any audio data that SuperCollider is capable of producing. And it is certainly the case that SuperCollider can produce very high channel count audio output. So it follows that Scintillator should also be able to handle these as inputs.
It might be best to expose to the log what the native API on the other side of PortAudio is providing, or is capable of providing, and offer the Scintillator user the option of requesting either interleaved audio with a fixed channel support or de-interleaved audio with an arbitrary number of channels. Perhaps the system by default could choose the one requiring the least processing power. Or perhaps ffmpeg audio decode and audio output will obviate the choice.
This is probably also worth waiting for some user feedback on, so opinions welcome here!