fakufaku / fast_bss_eval

A fast implementation of bss_eval metrics for blind source separation
https://fast-bss-eval.readthedocs.io/en/latest/
MIT License
130 stars 8 forks source link

Failing for my specific waveforms #16

Open sevagh opened 1 year ago

sevagh commented 1 year ago

Hello,

I have some waveforms on which the evaluation fails. The shape is [64, 2, 44100] (batch size 64, 2 channels, 44100 samples i.e. 1 second of music @ 44100 Hz sample rate)

I've attached the tensors (saved as .pt files), and my test looks like this:

"""
sevagh testing a random waveform
"""
import numpy as np
import torch
import pytest
from mir_eval.separation import bss_eval_sources
import fast_bss_eval

pred = torch.load("/waveform.pt", map_location=torch.device("cuda"))
target = torch.load("/waveform.pt", map_location=torch.device("cuda"))

print(pred)

if __name__ == "__main__":

    print(pred.shape, target.shape)
    print(pred.dtype, target.dtype)
    print(pred.device, target.device)
    print()

    sdr, sir, sar, perm = fast_bss_eval.torch.bss_eval_sources(target, pred, use_cg_iter=10)

    print(sdr, sdr.dtype)
    print()

The error output is:

root@0f6121f288b6:~/fast_bss_eval# python tests/test_sevagh_case.py
tensor([[[-0.1661, -0.1645, -0.1626,  ..., -0.1039, -0.1029, -0.1017],
         [-0.1661, -0.1644, -0.1626,  ..., -0.1039, -0.1029, -0.1018]],

        [[ 0.0318,  0.0322,  0.0325,  ...,  0.0085,  0.0079,  0.0073],
         [ 0.0298,  0.0304,  0.0307,  ...,  0.0115,  0.0108,  0.0102]],

        [[ 0.1203,  0.1211,  0.1217,  ...,  0.0269,  0.0282,  0.0295],
         [ 0.1187,  0.1194,  0.1201,  ...,  0.0259,  0.0272,  0.0285]],

        ...,

        [[ 0.0068,  0.0070,  0.0072,  ...,  0.0069,  0.0069,  0.0070],
         [ 0.0067,  0.0069,  0.0071,  ...,  0.0068,  0.0068,  0.0069]],

        [[-0.0176, -0.0196, -0.0215,  ...,  0.0601,  0.0605,  0.0608],
         [-0.0176, -0.0196, -0.0215,  ...,  0.0601,  0.0605,  0.0608]],

        [[ 0.0084,  0.0076,  0.0069,  ...,  0.0103,  0.0109,  0.0116],
         [ 0.0045,  0.0032,  0.0020,  ...,  0.0170,  0.0179,  0.0188]]],
       device='cuda:0')
tensor([[[-0.1661, -0.1645, -0.1626,  ..., -0.1039, -0.1029, -0.1017],
         [-0.1661, -0.1644, -0.1626,  ..., -0.1039, -0.1029, -0.1018]],

        [[ 0.0318,  0.0322,  0.0325,  ...,  0.0085,  0.0079,  0.0073],
         [ 0.0298,  0.0304,  0.0307,  ...,  0.0115,  0.0108,  0.0102]],

        [[ 0.1203,  0.1211,  0.1217,  ...,  0.0269,  0.0282,  0.0295],
         [ 0.1187,  0.1194,  0.1201,  ...,  0.0259,  0.0272,  0.0285]],

        ...,

        [[ 0.0068,  0.0070,  0.0072,  ...,  0.0069,  0.0069,  0.0070],
         [ 0.0067,  0.0069,  0.0071,  ...,  0.0068,  0.0068,  0.0069]],

        [[-0.0176, -0.0196, -0.0215,  ...,  0.0601,  0.0605,  0.0608],
         [-0.0176, -0.0196, -0.0215,  ...,  0.0601,  0.0605,  0.0608]],

        [[ 0.0084,  0.0076,  0.0069,  ...,  0.0103,  0.0109,  0.0116],
         [ 0.0045,  0.0032,  0.0020,  ...,  0.0170,  0.0179,  0.0188]]],
       device='cuda:0')
torch.Size([64, 2, 44100]) torch.Size([64, 2, 44100])
torch.float32 torch.float32
cuda:0 cuda:0

Traceback (most recent call last):
  File "tests/test_sevagh_case.py", line 23, in <module>
    sdr, sir, sar, perm = fast_bss_eval.torch.bss_eval_sources(target, pred, use_cg_iter=10)
  File "/root/fast_bss_eval/fast_bss_eval/torch/metrics.py", line 654, in bss_eval_sources
    coh_sdr, coh_sar = square_cosine_metrics(
  File "/root/fast_bss_eval/fast_bss_eval/torch/metrics.py", line 565, in square_cosine_metrics
    sol = block_toeplitz_conjugate_gradient(acf, xcorr, n_iter=use_cg_iter, x=x0)
  File "/root/fast_bss_eval/fast_bss_eval/torch/cgd.py", line 403, in block_toeplitz_conjugate_gradient
    precond = BlockCirculantPreconditionerOperator(acf)
  File "/root/fast_bss_eval/fast_bss_eval/torch/cgd.py", line 184, in __init__
    self.C = inv(C)
  File "/root/fast_bss_eval/fast_bss_eval/torch/compatibility.py", line 145, in inv
    return torch.linalg.inv(*args, **kwargs)
torch._C._LinAlgError: linalg.inv: (Batch element 770): The diagonal element 2 is zero, the inversion could not be completed because the input matrix is singular.

The waveforms are normal stuff (extracted segments from MUSDB18-HQ). I've attached waveform.pt ain a zip file. Can you help me figure it out? Thanks in advance. sevagh-bss-eval-error.zip

sevagh commented 1 year ago

Also I tested with the numpy code (i.e. same tensors with .detach().cpu().numpy() and got the same outcome:

File "/usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py", line 88, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
sevagh commented 1 year ago

I just had a clue; since I know I use auraloss.SISDRLoss() (Scale-Invariant SDR loss, https://github.com/csteinmetz1/auraloss) with success in my neural network, I tried your si_sdr_loss function, where I don't get the same error

So my waveforms are a problem for the Scale-Variant (or default) bss metrics, but succeed with scale invariant sdr.

On the other hand, si_bss is still a problem.

fakufaku commented 1 year ago

Hi, thanks for reporting. The SIR requires computation of the inverse of the covariance matrix. From the limited output your provide, it looks like there is very high correlation between the two reference signals. If that is the case, then their covariance matrix will be close to rank deficient and the system cannot be stably inverted. This is a limitation of the SIR metric itself, which is not well defined in this case. Nevertheless, when accuracy of the metric is not of utmost importance and you only want to ensure non-failure (e.g., while training a neural network), you can use the load_diag parameter which will add a small diagonal loading to the covariance matrix to stabilize inversion.

sevagh commented 1 year ago

Thanks for the suggestion, if I use a large enough value e.g. load_diag=1e-5, I get past this issue.

Indeed, basically in the early iterations of my neural network, there is very little "demixing" and the output is the same as the input until some demixing is learned.

Next, I run into the cost matrix having NaNs, +inf/-inf later on in the Hungarian method. I think maybe using the full BSS metrics (SDR, SIR, SAR) as the training loss is not going to be a great idea.

Maybe I'll use SDR for training loss, and SDR/SIR/SAR for validation loss.

fakufaku commented 1 year ago

Ok, that's good.

I can believe that the output of the networks may be very close to each other in the beginning. However, the problem arises when the targets are the same or close to each other. You may want to check that you give the target/pred in the correct order to the loss. Note that the order is reversed between the loss functions of pytorch (pred, target) and the bss_eval metrics (target, pred). In your example above, the waveform used is the same for pred in target.

For the NaN, there is another workaround implemented. You can set clamp_db=30 (for example) and the output will be clipped in [-30 dB, 30 dB]. This should solve the problems you are having with the Hungarian method.