f90 / AdversarialAudioSeparation

Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"
https://arxiv.org/abs/1711.00048
MIT License
83 stars 15 forks source link

Dataset links #3

Open JunwenBai opened 5 years ago

JunwenBai commented 5 years ago

Hi, Do you have the links to the datasets you use? I am new to these datasets, but the paper is very interesting. I want to reproduce the results. However, it is not easy to find them and how you change the downloaded dataset to the formatted dataset. Though the dataset structure is described in README, it is still not clear to me that how a formatted dataset should be. Do you mind elaborating on that a little bit, like giving us a sample dataset in the repo? Thanks!

f90 commented 5 years ago

Hey, the datasets are indeed a problem... I will explain: DSD100 is relatively easily accessed here: https://sigsep.github.io/datasets/dsd100.html

MedleyDB can be downloaded after registration here: https://medleydb.weebly.com/downloads.html

CCMixter download is (a bit hidden) on the website here: https://members.loria.fr/ALiutkus/kam/

iKala recently got retracted - you cannot sign a license for usage anymore and the download is not possible anymore... This is very unfortunate, since the license I have doesn't allow me to redistribute the dataset to you for replication. You should get similar results though if you simply exclude it from the experiment (more specifically, from the unsupervised part of the data that is used in the semi-supervised setting as well as from the validation and test data).

As for more explanation, see the Training.py file on how the datasets are loaded and handled internally, so you can get an idea how to get things started with different configurations, especially since you probably need to remove iKala from being used in the code there. This part should be easy though: Simply remove the iKala object by deleting the line

 ikala = Datasets.getIKala("iKala.xml")

and do not use it while iterating through the unsupervised datasets in line

for ds in [mdb, ccm, ikala]:

Hope that helps?