GuitarML / GuitarLSTM

Deep learning models for guitar amp/pedal emulation using LSTM with Keras.
https://guitarml.com/
GNU General Public License v3.0
367 stars 49 forks source link

Fix error to signal loss in notebook #12

Open jmiller656 opened 3 years ago

jmiller656 commented 3 years ago

This fixes issues with #10 and #11 , and is also related to GuitarML/PedalNetRT#16

This makes the training a sequence to sequence task (predict n output steps for n input steps) and also fixes a problem with the pre-emphasis filter that would cause high / infinite loss values.

In some listening tests, I've found that using mse and mae still seems to sound better than error_to_signal and also seem to optimize error_to_signal as a metric better than just using error_to_signal as a loss function. I suspect adjusting the learning rate (lowering it a lot) could help the error_to_signal loss work much better.

The next things I plan to do are looking into the improved loss functions and pre-emphasis filters in this paper as well as tuning the learning rate for error_to_signal loss

GuitarML commented 3 years ago

The pre emphasis filter looks correct, but I'm not getting the same training result as the original method of splitting the data without the data loader class. I tried with the 0.01 learning rate and with the smaller one. One part I don't quite understand, in your WindowArray class, the x_out and y_out are calculated the same way, but each input of x should be of length input_size, and each out y should be a single output. In that way, the previous input_size samples are used to predict the current single sample.

I'll need to read up on the TimeDistributed layer, not familiar with that one. Unfortunately I'm getting worse results than before, but I'm not exactly sure why that is yet.

jmiller656 commented 3 years ago

each input of x should be of length input_size, and each out y should be a single output

I think one part of the issue with the previous implementation of the error_to_signal loss is implementing it this way. When I read the paper, I see the equation for the loss looks like this: image

As you can see, the loss has a summation over yp[n] and y'p[n]. This summation is over n timestamps. I believe this means that the loss should be calculated at each sample in the window, which is why I need y to also have the same length as the window. As for the time distributed layer, this just applies the same dense layer (in our case, but you can use any layer there) across all the elements of a sequence.

I encourage you to try changing the loss back to mse or mae, these both seem to give much better results than error_to_signal currently

GuitarML commented 3 years ago

I think part of the problem is that the LSTM model I made isn't what they developed in the paper. They use a single input, single output stateful LSTM model, where I'm using a non stateful multiple input, single output LSTM model with two conv1d input layers. The paper uses stateful LSTMs to have information of previous samples, where I'm using an input of input_size (default 120) to train the model on what the next sample should be. It could be that the model I made won't work with the same error to signal equation from the paper, because it's made for a different model.

I'll run some more tests, but when I compared the plots and the sound of the mse to the error to signal (with same params and train time), it wasn't as good. I'll also try mae and see how that performs.

GuitarML commented 3 years ago

I would be interested to see if using the stateful LSTM -> dense layers from the papers with your new code would work. I tried to get the stateful lstms working but wasn't able to. There is another issue that might have some useful info for doing this: https://github.com/GuitarML/GuitarLSTM/issues/8#issue-767696103

jmiller656 commented 3 years ago

I think part of the problem is that the LSTM model I made isn't what they developed in the paper.

Yeah, that's a good point, it is different. If you do want a single output model rather than multi output, then it's probably better not to use error_to_signal loss, I think (out of curiosity, is it preferred to keep it that single output?).

By the way, could you post a plot of a model that performs well? I've been trying training the model with an even lower learning rate and for more epochs, and it seems to look promising. I'll post a plot of that soon (I'm assuming you use the output of plot.py)

jmiller656 commented 3 years ago

Here's my plot: image

mishushakov commented 3 years ago

here's my plot

ts9xl_signal_comparison_e2s_0 005 ts9xl_Detail_signal_comparison_e2s_0 005

GuitarML commented 3 years ago

That's the same thing I get with the old code and the TS9 example. Here is one (also TS9 sample) using the the old code and loss=mse, with the lowered model settings from the SmartAmpPro colab notebook. I left error to signal as a metric, and it ends up being about the same as using loss=error_to_signal. I think having the multiple sample input is more important to prediction than the choice of loss function for this model.

ts9_mse_Detail_signal_comparison_e2s_0 0164

GuitarML commented 3 years ago

I ran some more tests using mse and mae for loss, this time by creating models for SmartAmpPro. When using the same settings (3 epochs, 24 hidden units) they sound really close. For the easy to train sounds (like the TS9) I can barely tell a difference between those and the error to signal version. For high gain sounds the difference is more noticeable, but they all sound good. I'll post the models to let you try them out.

@jmiller656 what are your thoughts on mse vs mae? Sorry for leading you down the rabbit hole on error to signal, but what you already added to this colab script works great, and I'll probably propagate it to the local training and SmartAmpPro training.

GuitarML commented 3 years ago

Here are a few models using mse for loss, and with higher epochs (30 - 50). See the new colab script I added to the SmartAmpPro repo (train_colab_mse.ipynb).

loss_test_mse.zip

jmiller656 commented 3 years ago

Oh sweet! (sorry I disappeared for a bit there, busy week)

As for m se vs mae, I think both are probably fine. In terms of the math, I think that mse is slightly easier for the neural net to learn, but mae will probably be able to handle outliers better (the penalty is linear, rather than quadratic... I wonder if this has to do with the issue you mentioned with high gain sounds 🤔). There's also a "best of both worlds" option I've been meaning to play with more called huber loss (as well as some others which you can read more about in this blog I found ).