Closed nixingyang closed 8 years ago
Today I found my model trained on different music generated what sounded like white noise.
My problem appeared to be due to some of the converted wav files (generated in datasets/YourMusicLibrary/wave/
) being mono 32bit PCM audio at 8kHz whereas the GRUV conversion functions assume mono 16bit PCM audio at 8kHz.
If you find some of your wav files have the wrong bit-size, you can convert them with sox, e.g.:
sox oldfile.wav -b 16 newfile.wav
This might be the cause of the second issue you mentioned.
The first issue is probably due to over-fitting. Your trained model fits the training data well, but does not generalize to the validation data. However you want your validation loss to decrease at some point epochs earlier on. Some people have reported that for LSTM networks the validation loss can move up and down unpredictably during training before the optimal minimum is reached.
@gb96 Thanks for your reply. Have you trained a model which is capable of producing meaning sound? I re-implemented the code and I forgot to normalize the raw audio data. That might be the reason for these two issues.
@nixingyang I have trained models that produce sound (e.g, https://soundcloud.com/gb96/stairway-to-gruv-hd512-epoch48000-loss067-seed3x3 )
Have you tried running the audio_unit_test or equivalent? (see https://github.com/MattVitelli/GRUV/blob/master/data_utils/parse_files.py#L190 )
That verifies methods for loading/saving sound files, converting between wave and Numpy formats, and converting between time-domain and frequency-domain representations (via Fast Fourier Transform and its reverse)
I have defined a function which is similar to audio_unit_test and I can confirm that the transformation process is lossless. The audio you shared contains informative sound at the beginning. However, the model simply repeats useless sound after that. My prediction does not contain informative sound at all. Did you modify the generate_from_seed function and did you train your model solely on 65 seconds audio?
Looks like I have made some significant modifications to the generate_from_seed function. The main idea of my changes is to keep a fixed seed-sequence length. New predicted values are appended to the end and initial values are deleted from the beginning to maintain constant length.
# Extrapolates from a given seed sequence
def generate_from_seed(model, seed, sequence_length, data_variance, data_mean):
seedSeq = seed.copy()
output = []
# The generation algorithm is simple:
# Step 1 - Given A = [X_0, X_1, ... X_n], generate X_n + 1
# Step 2 - Concatenate X_n + 1 onto A
# Step 3 - Repeat MAX_SEQ_LEN times
for it in xrange(sequence_length):
seedSeqNew = model.predict(seedSeq) #Step 1. Generate X_n + 1
# Step 2. Append it to the sequence
newSeq = seedSeqNew[0][seedSeqNew.shape[1]-1]
output.append(newSeq.copy())
# Construct new seedSeq
newSeq = np.reshape(newSeq, (1, 1, newSeq.shape[0]))
seedSeq = np.concatenate((seedSeq, newSeq), axis=1)
seedSeq = np.delete(seedSeq, 0, 1)
# Finally, post-process the generated sequence so that we have valid frequencies
# We're essentially just undo-ing the data centering process
for i in xrange(len(output)):
output[i] *= data_variance
output[i] += data_mean
return output
To answer your question about training data, I used the first 65 seconds audio from each channel of a stereo source, for a total of 130 seconds. The reason I did that was because the source music had quite distinct sounds in each channel (e.g. guitar notes in one and vocals in the other) and I figured it would be easier to train a LSTM network on the separate sounds rather than the combined mono version.
The modification of generate_from_seed is reasonable. From my point of view, the algorithm devised in GRUV is not capable of handling real-world audio signals. Google has revealed WaveNet which is probably the state of the art.
Hi,
As the authors used the copyrighted songs (Madeon and David Bowie) in the original project, I fed the neural network with some other sound data sets instead. I wonder if anyone has encountered similar issues shown below.
BR.