andabi / music-source-separation

Deep neural networks for separating singing voice from music written in TensorFlow
795 stars 150 forks source link

Data Augmentation #9

Open ShengleiH opened 6 years ago

ShengleiH commented 6 years ago

Hello, thank you for sharing codes. I am confused about the data augmentation.

You said "circularly shift the singing voice and mix them with the background music."

I want to ask how long does the 'circular shift step'? I have checked Po-Sen's Matlab code, he said the best 'circular shift step' is 10,000. And in his code, I found he shifted the music, actually, not the vocal.

So I am very confused about data augmentation.

andabi commented 6 years ago

@ShengleiH What I mentioned in README is the summary of the paper written by Po-Sen. In my work, I didn't apply the circular shifting technique whereas I used cropping waveforms randomly at each step.

ShengleiH commented 6 years ago

I found that the default TrainConfig.SECONDS is set to 30s, then if I use 'mir1k' dataset, in which the audio length ranges from 4 to 13 seconds, it seems that the randomly cropping technique doesn't work here since the cropping length is longer than the audio length. So I am wondering if this default TrainConfig.SECONDS is set to fit 'iKala' dataset? Should I shorten the TrainConfig.SECONDS to fit the 'mir1k' dataset? But the default configuration on mir1k sounds good. Thank you~