Closed GuitarML closed 1 year ago
What is the progress status?
@38github I ran a few initial tests with two hidden stateful LSTM layers followed by a dense layer, but wasn't able to get it to converge. It could be something I'm doing wrong in setting up the training. I think it's a safe bet to go ahead with the non stateful LSTM, it seems to work well for this application.
I have a new plugin prototype for this model, it works well in real time using the following parameters:
Conv1d filters: 4 LSTM hidden units: 24 Input size: 120
I made a video for the new plugin with some brief sound samples:
@GuitarML
In the original paper, as you said earlier, the LSTM models have no convolutional layers. The LSTM model is just one single layer LSTM unit, that has an input size of one (the current sample). The LSTM is also stateful. At each time-step, the LSTM hidden state is updated and input into the fully connected layer, and the output of that fully connected layer is the model's predicted output sample (optionally the input sample can be added to the model prediction too, although the model should work either way).
When you tried your model previously with the stateful LSTMs, was the loss making any progress? If you have any questions about how we trained the models in the paper, I am happy to help! I just came across this repo recently, it looks really cool!
I'm curious about your model on 'SmartAmpPro', is the convolutional layer being applied over an input size of 120 samples, reducing it to 24 samples that are input into the LSTM? 5-minutes of training time on CPU is really fast!
@Alec-Wright thanks for the response! When I tried using a stateful LSTM layer in keras the loss never went below about 0.9. I didn’t spend too much time on it, but I suspect there was something wrong with either my loss calculation or how I set up the layers in Keras.
On SmartAmpPro I have two conv1d layers, so the output of the second layer is (1, 4) I believe. The exact training code is in the ‘resources/train.py’ file. I would love for someone with more experience in c++ to optimize my inference code, I imagine it could be a bit faster, and would allow for larger, more accurate models in real time. Since I don’t use stateful LSTMs there’s probably a way to calculate the samples in parallel.
Here’s the training code, model starts at line 105: https://github.com/GuitarML/SmartAmpPro/blob/main/resources/train.py
This goes without saying but thanks for your research on the subject! As a guitarist and a software engineer I just had to see what I could do with the the technology from your research papers. I imagine this kind of software is going to be all over the music industry soon.
After further inspection on the original research paper, I believe the LSTM model described is different from the implementation in GuitarLSTM. It describes using two hidden layers, which I initially thought were two conv1d layers, but I think they meant two stacked LSTM layers. In that case, the model would look like this:
That would explain why the model had to be trained for 500 epochs, and also why it took 20 hours on a high end GPU. The model would take in a single audio sample, as opposed to the "input_size" amount.
Either way, the conv1d layers seem to offer a significant speed advantage while training. The conv1d layers allow for a large input_size, while reducing number of parameters going into the LSTM layer. The configuration has yet to be tested in a real time application.
The paper also describes adding the input to the output, so that the model only has to learn the difference. This technique is not implemented in GuitarLSTM.
Experimentation on the model stack would be beneficial to finding an optimal approach. Feel free to share any findings here, or on the facebook group (intended for discussion on model training). Link to facebook Community Group: https://www.facebook.com/groups/674031436584335/?ref=pages_profile_groups_tab&source_id=102883764967858
Update: Keras uses LSTM stateful=false by default, which means hidden and cell states aren't carried to the next step. Based on the statements in the paper, it appears they use a stateful LSTM, which is another difference.