NN.ar()
nn_tilde adaptation for SuperCollider: load torchscripts for real-time audio processing.
It has most features of nn_tilde:
interface for any available model method (e.g. forward, encode, decode)
interface for setting model attributes (and a debug interface to print their values when setting them)
processes real-time at different buffer sizes, on separate threads
loads models asynchronously on scsynth
tested so far only on CPU, on linux, windows and mac.
nn.ar
folder to your SuperCollider Extensions folderNote for mac users: binaries are not signed, so you need to run the following in SuperCollider to bypass macos security complaints:
runInTerminal("xattr -d -r com.apple.quarantine" + shellQuote(Platform.userExtensionDir +/+ "nn.ar"))
Failing to do so can produce errors like:
"libc10.dylib" is damaged and can’t be opened. You should move it to the Bin.
Note for windows users: you need to copy some DLLs to the folder where you have scsynth.exe
.
ignore
. Copy all the files inside this folder.scsynth.exe
, it's usually something like: C:\Program Files\SuperCollider3.x.x\
(where 3.x.x is the version you have installed).scsynth.exe
is.Alternatively, you can use NN.copyDLLs
, but you might have to run SuperCollider as Administrator:
s.quit; // server needs to be off for this to work
NN.copyDLLs
NN.ar is just an interface to load and play with pre-trained models, but doesn't include any. To begin with, you can find some published pre-trained RAVE models for real-time neural synthesis:
The following examples assumes you have a pre-trained model saved at "~/Documents/Percussion.ts"
:
// Example:
// 1. load
s.waitForBoot {
// when in a Routine, this method waits until the model has loaded
NN.load(\ravePerc, "~/Documents/Percussion.ts");
// when model has loaded, print a description of all methods and attributes
NN(\ravePerc).describe;
}
Available methods are depending on the specific model you're using. RAVE models typically have three methods:
// list all available methods of a pre-trained model
NN(\ravePerc).methods;
// -> NNModelMethod(forward: 1 ins, 1 outs), NNModelMethod(encode: 1 ins, 8 outs), ...
// 1. resynthesis using \forward method
{ NN(\ravePerc, \forward).ar(WhiteNoise.ar) }.play;
// 2. latent modulation using \encode and \decode
{
var in, latent, modLatent, prior;
// encode sound input to RAVE latent space
latent = NN(\ravePerc, \encode).ar(SoundIn.ar);
// add a random modulation to every dimension
modLatent = latent.collect { |l|
l + LFNoise1.ar(MouseY.kr.exprange(0.1, 30)).range(-0.5, 0.5)
};
// resynthesize modulated latents
NN(\ravePerc, \decode).ar(modLatent);
}.play;
// 3. manual latent navigation using only \decode
// here we assume ravePerc has 8 latent dimensions
Ndef(\rave) { NN(\ravePerc, \decode).ar(\latents.kr(0!8)) }.play;
// set all latents
Ndef(\rave).set(\latents, [0.75, 0.1, 1.5, 0.2, 0.3, 0.7, 1.2, 0.33])
// set any latent programmatically
Ndef(\rave).set(\latents, 2, 0.5);
// simple GUI with sliders:
(
var win = View(bounds:800@200);
var sliders = 8.collect {|i|
Slider().background_(Color.rand)
.action_{ |sl| Ndef(\rave).seti(\latents, i, sl.value) }
};
win.layout_(HLayout(*sliders)).front
)
// using the NodeProxyGui2 Quark, you automatically get the sliders you need:
// find it at https://github.com/madskjeldgaard/nodeproxygui2
Ndef(\rave).gui2
Some models have custom attributes, to further condition their behavior. Which attribute and which behavior depends on the specific model. NN.ar supports setting and getting attribute values per-UGen, using the attribute
argument:
// list model supported attributes
NN(\msprior).attributes;
// -> [ listen, temperature, learn_context, reset ]
// attributes can be set only per-UGen, using the 'attributes' argument
{
var in, latent, modLatent, prior, resynth;
in = SoundIn.ar();
latent = NN(\rave, \encode).ar(in, 2048);
modLatent = latent.collect { |l|
l + LFNoise1.ar(MouseY.kr.exprange(0.1, 30)).range(-0.5, 0.5)
};
prior = NN(\msprior, \forward).ar(latent, 2048
// attributes are set when their value changes
attributes: [
// here we use latch to limit the setting rate to once per second
temperature: Latch.kr(LFPar.kr(0.1).range(0, 2), LFPulse.kr(1))
],
debug: 1 // print attribute values when setting them
);
NN(\rave, \decode).ar(prior.drop(-1), 2048);
}.play;
Like nn_tilde, nn.ar uses an internal circular buffer and runs neural network processing in a separate thread. The second argument of NN(...).ar
controls this buffer's size, with 0 resulting in no buffering and no separate thread:
// large buffer
{ NN(\ravePerc, \forward).ar(WhiteNoise.ar, 8192) }.play;
// no buffer
{ NN(\ravePerc, \forward).ar(WhiteNoise.ar, 0) }.play;
When supplying multiple inputs, NN.ar will process them using the same model, with batch processing.
// resynthesize three sine waves using three separate copies of the same model:
{ SinOsc.ar([100,200,300]).collect { |in| NN(\ravePerc, \forward).ar(in) } }.play
// with multi-channel: use a single model to resynthesize three inputs at the same time:
{ NN(\ravePerc, \forward).ar(SinOsc.ar([100,200,300])) }.play
// example 2: pseudo-stereo:
// - add 2 different noise vectors to the same latents
// - decode 2 slightly different channels, using a single model
{
var latents = NN(\ravePerc, \encode).ar(SoundIn.ar);
// latent size is 8
var noiseVectors = 2.collect { LFNoise1.ar(1).range(-0.1, 0.1)}!latents.size };
// noiseVectors shape is [2, 8]
var modLatents = noiseVectors.collect { |noise| latents + noise };
// modLatents shape is [2, 8]
var out = NN(\ravePerc, \decode).ar(modLatents)
// out shape is [2]
out
}.play
Note that attributes don't cause multichannel expansion, precisely because we are loading a single model, that can only have one set of attributes.
If you compile SuperCollider from source, or if you want to enable optimizations specific to your machine, you need to build this extension yourself.
Build requirements:
If you don't have a copy of supercollider's source code, you can get one by:
git clone https://github.com/supercollider/supercollider
git clone https://github.com/elgiano/nn.ar
cd nn-supercollider
We have built nn.ar releases with the following versions of libtorch:
.whl
file and take only the folder called "torch".Because of difficulties with torch CMake files and discrepancies in how different systems install libtorch, we don't recommend using system-wide installed libtorch. However, if you'd need to do it, you can enable finding and linking against system-installed torch with -DSYSTEM_TORCH=ON
.
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DTORCH_PATH=/path/to/libtorch/ \
-DSC_PATH=/path/to/sc/source/ \
-DCMAKE_INSTALL_PREFIX=/path/to/sc/extensions/ \
-DNATIVE=ON
Options:
-DTORCH_PATH
. Systemwide installed torch will not be found automatically by default.Platform.userExtensionDir
Finally, use CMake to build and install the project:
cmake --build build --config Release --target install
Note: for building with any supercollider version earlier than 3.13: nn.ar needs a macro called
ClearUnitOnMemFailed
, which was defined in supercollider starting from version 3.13. If for any reason you need to build nn.ar with a previous version of supercollider, you have to copy these two macros and put them inNNModel.cpp
.
The usual regenerate
command was disabled because CmakeLists.txt
needed to be manually edited to include libtorch.
Buffering and external threads Most nn operation, from loading to processing, are resource intensive and can block the DSP chain. In order to alleviate this, but costing extra latency, we adopted the same buffering method as nn_tilde. When buffering is enabled (by default if not on an NRT server), model loading, processing and parameter setting are done asynchronously on an external thread. The only issue still present with this approach is that we have to wait for the thread to finish before we can destroy the UGen. This currently blocks the DSP chain when the UGen is destroyed.
Model and description loading For processing purposes, models are loaded by NNUGen. This is because each processing UGen needs a separate instance of the model, since multiple inferences on the same model are not guaranteed not to interfere with each other. So now models are loaded and destroyed with the respective UGen, similarly to what happens in MaxMSP and PureData. However, since we couldn't find in SuperCollider a convenient method to send messages to single UGens, we opted for loading model descriptions separately, so that paths and attribute names could be referenced as integer indexes.
NN.load
loads the model on scsynth and save its description in a global store, via a PlugIn cmd. Once a model is loaded, informations about which methods and attributes it offers are cached and optionally communicated to sclang. In lack of a better way to send a complex reply to the client, scsynth will write model informations to a yaml file, which the client can then read.Attributes
Since each UGen has its own independent instance of a model, attribute setting is only supported at the UGen level. Currently, attributes are updated each time their value changes, and we suggest to use systems like Latch
to limit the setting rate (see example above).
RAVE models can exhibit an important latency, from various sources. Here is what I found:
tl;dr: if latency is important, consider training with --config causal.
--config causal
, which reduces this latency to about 5ms, at the cost of a "small but consistent loss of accuracy".