Open darius522 opened 1 month ago
Hello,
Thank you for kind words, we used a 90% 10% split for training and validation set. For the evaluation setup we selected random audio samples from the validation set and generated images from multiple seeds. For instance, for the landscape dataset we used 5 audio samples from 11 classes. So that from 55 audios with 10 seeds we generated 550 images. We used the remaining part of the validation set to calculate our reference based metrics. I hope this clarifies questions your mind.
Hi,
First off thank to the authors for this fantastic piece of work.
I am currently working on a similar project and would like to possibly benchmark SonicDiffusion as part of our paper. Thank you for providing the pretrained models for that purpose. Would it be possible for the authors to also provide the train/test split strategy used to train and evaluate these models ?
Please let me know and thanks again!
Darius