Inquiry on SDR mean and median on MUSDB/test

f90 / Wave-U-Net

Implementation of the Wave-U-Net for audio source separation

MIT License

824 stars 177 forks source link

Inquiry on SDR mean and median on MUSDB/test #44

Closed SibingWu closed 4 years ago

SibingWu commented 4 years ago

When running the provided pre-trained model on musdb, i.e. call compute_mean_metrics, the SDR result on musdb/test is "acc_median: -15.40719 acc_mean: -15.94557 voc_median: -18.48437 voc_mean: -19.03658" while SDR result on musdb/train is "train：acc_median: 3.91334 acc_mean: 4.07193 voc_median: 7.46886 voc_mean: 28.05654"

I wonder why the result SDR on musdb test dataset is so strange which obviously does not match what is stated in github "M5-HighSR is our best vocal separator, reaching a median (mean) vocal/acc SDR of 4.95 (1.01) and 11.16 (12.87), respectively.".

f90 commented 4 years ago

These numbers look like something went really really wrong somewhere. Did you listen to the results? Also check that you are using the correct versions of the libraries and Python etc., e.g. Tensorflow in 1.8 version.

SibingWu commented 4 years ago

These numbers look like something went really really wrong somewhere. Did you listen to the results? Also check that you are using the correct versions of the libraries and Python etc., e.g. Tensorflow in 1.8 version.

I wonder which dataset is the one your provided result refers to? Is it musdb/train or musdb/test?
Furthermore, I wonder how you called compute_mean_metrics to generate the result? I ask this just want to make sure that I have called it properly. It seems that this function is only called in Plot.py but there seems no calling of the functions in Plot.py.
In fact when running the pretrained model on musdb, there exists some negative value for SDR. Is this normal?

Thank you so much for your help!

f90 commented 4 years ago

The results refer to MUSDB Test partition.
The evaluation function from the museval is called within the predict function in Evaluate.py during the final evaluation, which generates the SDR etc. values and puts them into the specified output folder. That's why later on for results analysis, you can just read those files and average the computed SDRs in Plot.py.
Some SDR values can be negative, this is normal (although it indicates bad separation), since it is computed based on a log(x) where x is some ratio between signal energies, so if that ratio is below 1, you will get a negative value.

As I said, to fix your bug I would try to listen to the results/look at the waveform outputs first to get an idea of what could have gone wrong during evaluation.

SibingWu commented 4 years ago

1. The results refer to MUSDB Test partition.

2. The evaluation function from the museval is called within the `predict` function in `Evaluate.py` during the final evaluation, which generates the SDR etc. values and puts them into the specified output folder. That's why later on for results analysis, you can just read those files and average the computed SDRs in `Plot.py`.

3. Some SDR values can be negative, this is normal (although it indicates bad separation), since it is computed based on a `log(x)` where `x` is some ratio between signal energies, so if that ratio is below 1, you will get a negative value.

As I said, to fix your bug I would try to listen to the results/look at the waveform outputs first to get an idea of what could have gone wrong during evaluation.

I managed to fix it successfully. Thank you very much for your help!