Open hagenw opened 5 years ago
Thanks for adding this corpus. Could you explain the idea behind percentage_silence
? I didn't fully get what happens to signal
and target
.
Yeah, I forgot to mention that in the discussion, I added it there now.
I'm interested in reviving this, if you are too @hagenw. It would require some work to rebase, but I think the code is already good and we could easily add.
Is there some reason you've marked it as a WIP?
I marked it as WIP as it was only a first try. Then I stopped working on it, as I thought we would internally convert it to our format and this is no longer needed.
True, but we could think of adding it to audtorch
as it is now. I think it's working and does what it should do, right? So since the code is already there, the community might still benefit from it. Or do you think it needs to be reworked?
No, you are right, we should integrate it into audtorch. I will start with doing the rebase first.
What is still missing is automatic download of the data set, as the community expects this as well, but I think we could start without as you are only allowed to download the 7s versions of the files anyway.
Summary
Add musdb18 which is a data set for musical source separation.
Proposed Changes
datasets/musdb18.py
musdb
as dependency tosetup.py
Musdb18
to documentationDiscussion
Implemenation: the data set cannot be based on our
AudioDataset
as the data set comes in a special format and requires the external packagemusdb
to read the files. Alternatively, there is the option to first convert it to WAV and then use our normal approach. NOTE: in order to workmusdb
needs ffmpeg installed on your system.Automatic download: is not yet included as the data set cannot be freely downloaded, but you have to ask for permission first. There is a short version of the data set (7s excerpts) that can be automatically downloaded. So we might think about including those.
percentage_silence: this idea is from the
SpeechNoiseMix
data set. It should help to force your trained model to return silence for parts of the signal where the target source is not active (e.g. no speaker talking forSpeechNoiseMix
or no singing vocal in the case of this data set).