Hi. I'm looking at training my own models for use with this. I've been looking at the Magenta documentation and Colab notebook called "train_autoencoder". There it says to train models with 10-20 minutes of recordings from a single source. I wanted to ask how large a dataset you used with the scream/Celine Dion example on YouTube (I though the results were good but couldn't imagine you having a 20 minute scream dataset from a single source!).
Hi, the scream model was not DDSP but wavae (see https://github.com/acids-ircam/wavae) ! And to be honest we do have a 4 hour long dataset of screams 😱 !
Hi. I'm looking at training my own models for use with this. I've been looking at the Magenta documentation and Colab notebook called "train_autoencoder". There it says to train models with 10-20 minutes of recordings from a single source. I wanted to ask how large a dataset you used with the scream/Celine Dion example on YouTube (I though the results were good but couldn't imagine you having a 20 minute scream dataset from a single source!).
Thanks.