damskaggep / WaveNetVA

WaveNet for virtual analog modeling, implemented as a real-time audio plugin using JUCE.
Apache License 2.0
244 stars 25 forks source link

Training New Models #3

Open lyellr opened 4 years ago

lyellr commented 4 years ago

I was wondering if you had any advice or code you'd be willing to share for training other models. I have a couple pieces of hardware I'd love to play around with using your inference implementation. Thanks!!

ljuvela commented 4 years ago

Sorry but we can't share the training code directly at this point. I had a look at this and it's mostly correct https://github.com/teddykoker/pedalnet (Also seeing you've branched it 🙂)

As for tips, check that the convolutions actually use a causal padding mode. That's not available in PyTorch by default and easy to get slightly wrong (valid padding trims symmetrically).

Next tip: be careful when permuting PyTorch tensor dimensions for JSON export. We used Tensorflow originally and the convolution weight tensors work differently there.

GuitarML commented 4 years ago

So for 10 dilated layers the left side padding would actually be quite large compared to the input, correct? Assuming pad=(kernel_size-1)*dilation

ljuvela commented 4 years ago

You're correct. The total amount of zero padding will then match the receptive field of the network. If you're concerned about the "invalid" samples at the model output, you can always trim the target and model output signals to the valid length.

I find it convenient to create a "CausalConv1D" module, which follows the standard Conv1D semantics, but zero pads internally. Other methods may be more efficient, but this I think is least error-prone.

GuitarML commented 4 years ago

Thanks for the help, have not been able to figure out the causal padding implementation yet, but I was able to convert a model trained from PedalNet into a format that WaveNetVA can build a plugin from, and the result was listenable but still pretty far off from the ts9 pedal samples I recorded. I used 0's for the layer values I couldn't infer from the pytorch model: the -1 input layer weights and biases, and the remaining weight values for layer 0 (pytorch only had 96 values here, so I added 0s to get to 1536). Also the linear mix layer weights were really high compared to the wavenet1.json models, so I scaled them down by a factor of 3 to get something that would run in my DAW. Any tips would be appreciated, I'm new to AI programming so my guesses at the model conversion are probably not great, but being able to virtualize some of my music equipment would be pretty cool.

ljuvela commented 4 years ago

You could try this to implement a causal convolution https://github.com/pytorch/pytorch/issues/1333#issuecomment-400338207

Sounds like there is some type of size mismatch between the PedalNet and WaveNetVA configurations. I'd think it's best to try identify what that is exactly, as it's really hard to adjust the weights manually post-hoc.

Another thing that's easy to get wrong when exporting is the convolution weights ordering. Pytorch uses (out_channels, in_channels, kernel_size), while the plugin (and TensorFlow which we used originally) uses (kernel_size, in_channels, out_channels). When exporting from PyTorch, you'll need to permute the weights before exporting.

For reference, this function reads in a flattened double array to a convolution kernel https://github.com/damskaggep/WaveNetVA/blob/d04cc2d1e1bca78697412bbb19bc0674c6df4fc9/Source/Convolution.cpp#L120-L131

As a fun side note, I remember playing around with the plugin with different random weights and most of the time it sound like some kind of usable distortion effect. This seems to drop off from the WaveNet structure somehow.

GuitarML commented 4 years ago

That helps a lot, thanks! I'll share the code once I get it working right. I did add an analysis script to my PedalNet fork to compare predicted vs actual wave files, if anyone is interested. It would be interesting to see what other types of hardware this model works well on, such as a compressor, or microphones.

GuitarML commented 4 years ago

Well, I think I've accounted for all the obvious differences in my converter, but the sound still doesn't match the original pedalnet model when loaded in the WaveNetVA plugin. Needed to add an input layer to get the layer sizes to match up, and the large weights on the linear mix layer was due to training on Int16 audio data as opposed to Float32. The code as it stands is available in my fork of PedalNet, along with my trained and converted models, if anyone wants to take a look.

ljuvela commented 4 years ago

Good catches!

I think something might go wrong when you're slicing the residual output and skip values on lines 96 and 99 in
https://github.com/keyth72/pedalnet/blob/1cf03f73a8a5f60d157422849cf43a75dfb7f6ef/model.py#L81-L101 Doing causal padding the way you did it should give the same size outputs anyway?

Have you tried to match the numerics in a very minimal example (one hidden layer, small dimensions)? No need to even train the model, just export random weights and test with a few input samples of a linear ramp or something similar. Debug build in standalone mode and printing to std::cerr are useful.

GuitarML commented 4 years ago

Got the converter working, I had to combine the tanh and sigm layers into one layer in the PedalNet model. Stepping through the wavenetva code in debug mode really helped. Thanks again!

ljuvela commented 4 years ago

Awesome! We should add a pointer to your pedalnet fork in the Readme!

GuitarML commented 4 years ago

Go for it! I think what I'd like to do next is combine this with traditional modeling to add in things like drive/tone/reverb controls. And also look into lowering the network latency for live guitar playing. For the time being though, it will be nice to have the sound of an amp "cranked to 12" for my recordings without waking up the whole neighborhood!

damskaggep commented 4 years ago

Hey, nice job getting the exporter working! About lowering the network latency: the model itself shouldn't introduce any latency. Any latency in the processing is due to the buffer size used by Juce. The buffer size can be changed in the settings of the compiled standalone version, or in the settings of your DAW if you are using the plugin version.

ljuvela commented 4 years ago

Another potential source of latency is your recording setup. An easy way to sync the recorded input-target pairs is to use a physical loopback connection for the input.

GuitarML commented 4 years ago

Changing the buffer size in my DAW fixed it, sounds great now. Out of curiosity, are there any plans to release the RNN plugin code?

yudashuixiao1 commented 4 years ago

Hi, does it work in WINDOWS? It happened some serious noise when I tried to buit the project and the sound is input through the sound card. or other problems caused? thx!

GuitarML commented 4 years ago

I'm running it in Windows, I hear some clicks occasionally but adjusting the settings in the DAW help, and it's a low end computer. I'm going through a separate audio interface and using the VST plugin though, haven't tried directly into the sound card.

ljuvela commented 4 years ago

To run this (or pretty much any other audio plugin) on Windows, it's best to have an external audio interface with ASIO support. Windows Audio drivers and internal sound cards won't allow low enough latency for real-time playing, plus you're likely to get some very annoying buffer-grind noise.

GuitarML commented 4 years ago

If anyone on this thread is interested, I added two guitar plugins built from the WaveNet model: https://github.com/keyth72/SmartGuitarPedal https://github.com/keyth72/SmartGuitarAmp If anyone wants to add new models or features (or point out bugs in my code), I'd be happy to incorporate them.

yudashuixiao1 commented 4 years ago

amazing! It work well in PC.I plan to port the model to embedded devices. Can the computing power of the chip support this model?

GuitarML commented 3 years ago

@ljuvela I made a LSTM model in Keras based on the research paper, but there are a few things I'm not sure about from reading the paper. If you can't comment on it I understand, but I opened up this issue to try to interpret the research paper more accurately: https://github.com/GuitarML/GuitarLSTM/issues/8

Thanks!