Open LukasDrude opened 7 years ago
I am not sure if I understand your setup. Maybe I don't really understand what the three estimates would represent? Can you explain this in more detail, please? Also, in speech enhancement people are more using perceptual quality evaluation measures such as pesq. So I am not sure if bsseval would be the best fit here.
I think there are several things to say on this matter, and two main routes for dealing with that issue
A/ having noise image
=> To me, the best practice is hence to always know the true noise, as well as the true "target" source images, for evaluation.
If this is not the case and you only know target sources, and not the actual images along with the true noise, this raises the related interesting question: => how to estimate the image of those two sources withing the mix, as well as a further image for the noise?
This question is probably ill-posed, because it requires some prior assumptions on what noise should be like. If we are to implement this computation of the target images + noise within the evaluation function, this means we are going to arbitrarily make such an assumption for computation. That said, since these computations DO NOT use the estimates, but only the references and the mix, it is ok we are not going to have flaws as in the case of bsseval_sources that exploits the estimates to compute references. However, I don't see a particular consensus in how to compute these "groundtruth images" from the groundtruth sources. Anyways, this would appear as a separated module that has no particular connection with mir_eval.
B/ trying all combinations
A simple solution could be to simply try all the possible combinations, and to just discard the input source that gives the worst performance as being the noise source. This would allow inputing +1 estimate, at the cost of doing more computations. Still, doing this actually DOES NOT totally solve the problem I see with your setup, because it probably means using bsseval_sources, which I again strongly advise you not to do.
@faroit
I would like to use
mir_eval.separation.evaluate
to evaluate the separation performance (SDR, SIR, SNR) of a separation system in the presence of noise.We may assume the following:
x_1
: Clean speech, speaker 1x_2
: Clean speech, speaker 2n
: Additive noiseNow we may have a system
S
, which estimates three (possible permuted) enhanced signals:z_1
: Enhanced signal 1z_2
: Enhanced signal 2z_3
: Enhanced signal 3How would I use the function
mir_eval.separation.evaluate
to evaluate the result, since it currently only allows K reference signals and K target signals but does not have an additional input for noise signals.If we find a good solution, we may add it to the docs later.