Closed mishushakov closed 1 year ago
Yes, it's doing the same preprocessing that it does before training. Each predicted sample is determined by the previous input_size number of samples. So for a model with input size 100, and a wav file of 44100 samples (1 second) it creates an array of size (44100, 100, ) but 99% of all those samples are redundant, so there's definitely a better way of handling that. I started a custom dataLoader class which takes a small batch of data, preprocesses/trains it, then frees up the memory for the next batch. I'm having trouble with it training properly though, so I'll share that in case someone wants to try to fix it. The split_data param is basically a work around because I couldn't get that class working yet.
Hey there & happy holidays! thanks for the explanation
maybe using the tensorflow audio preparation module would resolve the problem? https://www.tensorflow.org/io/tutorials/audio
Happy Holidays! That does look helpful, I can probably use something in there. I don't see anything in particular that solves this data preparation problem though, I think the solution is still getting the custom data loader to work. In the meantime, the split data param will allow for training with limited RAM.
I've made some progress on a plugin for the LSTM models, I'm really excited for what can be done with that. Lots of good things coming for 2021!
thanks for sharing! i'd love to help out (where i can) after holidays
this particular part caught my attention:
The content of the audio clip will only be read as needed, either by converting AudioIOTensor to Tensor through to_tensor(), or though slicing. Slicing is especially useful when only a small portion of a large audio clip is needed
as far as i understand the data will be lazy loaded, however i wasn't entirely sure, if this is what we need
on the side note Google's tone transfer looks very promising: https://sites.research.google/tonetransfer/ i believe they use the technology to create voices for their Google Assistant
thanks and lets hope 2021 will be nothing like 2020!
Update: The colab notebook has been updated to fix the Out of Memory issue by using a Sequence class for loading the data one batch at a time. It also uses MSE for the loss calculation to alleviate issues with the error-to-signal with pre emphasis filter. Conducting more tests on choice of loss function before rolling out this change to the python scripts.
I have noticed that in order to properly finish training you'll need a lot of free memory in order to run the prediction if you try to save the tensor as file the resulting file is going to take up gigabytes
in my case (#3) you're basically getting 65mb of data for 242kb of audio (26346% increase)