fakufaku / fast_bss_eval

A fast implementation of bss_eval metrics for blind source separation
https://fast-bss-eval.readthedocs.io/en/latest/
MIT License
130 stars 8 forks source link

How to evaluate SIR and SDR for mono wav file #13

Closed Shin-ichi-Takayama closed 2 years ago

Shin-ichi-Takayama commented 2 years ago

Hello.

I have a question about how to evaluate SIR and SDR for mono wav file. How do I evaluate SIR and SDR for mono wav files?

I have the following mono wav files.

The length of the wav file is 4 seconds. The sampling frequency is 16k Hz. I calculated the SIR of the mono wav file and it was Inf. As I asked in Issue #12, the SIR was Inf for the following code.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

_, ref = wavfile.read("./data/ref.wav")
_, est = wavfile.read("./data/est.wav")

ref = ref[None, ...]
est = est[None, ...]

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr)
print('sir:', sir)
print('sar:', sar)

sdr: 14.188884277900977 sir: inf sar: 14.18888427790095

However, I would like to evaluate the SIR with a mono wav file. To avoid the SIR to be Inf, I divided the wav file into 4 parts. Is the following code able to evaluate SIR and SDR correctly?

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

ref = np.zeros((4, 16000))
est = np.zeros((4, 16000))

_, ref_temp = wavfile.read("./data/ref1.wav")
_, est_temp = wavfile.read("./data/est1.wav")
ref[0] = ref_temp
est[0] = est_temp

_, ref_temp = wavfile.read("./data/ref2.wav")
_, est_temp = wavfile.read("./data/est2.wav")
ref[1] = ref_temp
est[1] = est_temp

_, ref_temp = wavfile.read("./data/ref3.wav")
_, est_temp = wavfile.read("./data/est3.wav")
ref[2] = ref_temp
est[2] = est_temp

_, ref_temp = wavfile.read("./data/ref4.wav")
_, est_temp = wavfile.read("./data/est4.wav")
ref[3] = ref_temp
est[3] = est_temp

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr.mean())
print('sir:', sir.mean())
print('sar:', sar.mean())

sdr: 16.156123610321156 sir: 28.957842593289392 sar: 16.444840346137177

What signals are needed for each channel of ref and est? Best regards.

fakufaku commented 2 years ago

This is indeed a good question! I don't think splitting the file is the correct way to do it.

In your case, you have access to both the clean speech and the noise, so the best is to use both as references.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

# assume all files are mono
_, speech_ref = wavfile.read("./data/ref.wav")
_, noise_ref = wavfile.read("./data/noise.wav")
_, est = wavfile.read("./data/est.wav")

ref = np.stack([speech_ref, noise_ref], axis=0)
# I think it should work also with `est[None, ...]`, but to be sure make est
# the same number of channels as ref
est =np.stack([est, est], axis=0)

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr[0])
print('sir:', sir[0])
print('sar:', sar[0])
Shin-ichi-Takayama commented 2 years ago

Thank you for your response. I was able to evaluate the SIR and SDR with a mono wav file.