f90 / Wave-U-Net

Implementation of the Wave-U-Net for audio source separation
MIT License
824 stars 177 forks source link

Qustions-About the paper result #30

Closed Wangbin1125 closed 5 years ago

Wangbin1125 commented 5 years ago

Hi,I'm trying training the M6 with the musdb dataset and have the following two questions to consult you. 1)How much GPU memory is needed to train this model?I have to set the batch size to 1, otherwise the GPU will report a memory error at the beginning of training.

2)When the code that Training.py is finished, the result saved in the folder where the evaluation results are saved is 151 json files, one json file for each song, and one test-test.json file.There are also four separate audio sources for each song.I want to know how to produce the Table3(test performance metrics for multi-instrument model) in the paper? I think the compute_mean_metrics(json_folder, compute_averages=True, metric="SDR") function will computes all the numbers shown in the paper (Mean, SD, Median, MAD),but I don’t know how to use this function.I only found that the drawing module named plot.py calls this function, so how do I use this function in the evaluation process? I am a deep learning beginner and hope to get your answer

f90 commented 5 years ago

Hey! For 1), the batch size is set to 16 in all our experiments and this ran fine with 8GB of GPU memory, so I am a bit surprised that you have so much memory issues? Is it the same with the singing voice separation models?

For 2), you are right that the compute_mean_metrics function can be used to compute the results. It is meant to be used as standalone function, so you should be able to do

import Evaluate
Evaluate.compute_mean_metrics(PATH_TO_JSONS)

from a Python console, where PATH_TO_JSONS is simply the path to the folder containing all the JSON files you want to evaluate, so that should be 50 (or 51 including test.json) files, one for each song. For details on the other parameters refer to the documentation of the compute_mean_metrics function.

draw_violin_sdr can be used with the same json path parameter to directly plot the distribution of SDR values so it builds on the other function.

Wangbin1125 commented 5 years ago

@f90 According to your instruction, I have successfully solved my second problem. I haven't trained the singing voice separation models yet, I'll try again for GPU memory problems and come back to you if there are any questions. Thank you very much for your detailed reply again.

f90 commented 5 years ago

No problem! Closing this for now due to inactivity.