LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.47k stars 432 forks source link

[DOA] Add assert for number of sources in subspace decomposition based methods #140

Open L0g4n opened 4 years ago

L0g4n commented 4 years ago

Hi,

sorry to bother, but I was trying to evaluate the other algorithms (besides SRP & MUSIC which work with my setup), but then I tried to run the remaining algorithms over my setup and encountered crashing behaviour, in the case of only adding the WAVES algorithm, I get this traceback:

doa.locate_sources(X, freq_bins=freq_bins)
  File "/usr/local/lib/python3.7/site-packages/pyroomacoustics/doa/doa.py", line 336, in locate_sources
    self._process(X)
  File "/usr/local/lib/python3.7/site-packages/pyroomacoustics/doa/waves.py", line 97, in _process
    self._construct_waves_matrix(C_hat, f0, beta)
  File "/usr/local/lib/python3.7/site-packages/pyroomacoustics/doa/waves.py", line 121, in _construct_waves_matrix
    P = (ws-wn[-1])/np.sqrt(ws*wn[-1]+1)
IndexError: index -1 is out of bounds for axis 0 with size 0

which indicates that wn (the eigenvalue for the noise subspace is empty according to the soure code). I suspect that this is the reason why CSSM also fails with my setup. This is my setup code (basically the same as before)

import itertools

import matplotlib.pyplot as plt
import numpy as np
import pyroomacoustics as pra
from pyroomacoustics.doa import circ_dist
from scipy.io import wavfile
import sounddevice

ROOM_DIMENSIONS = np.array([15, 12]) # x, y, z
SNR = 20 # dB
DISTANCE = 0.15 # between microphones, in meters
c = 343. # m/s

def main():
    """
    The main entry point for the application.
    """

    corpus = get_speech_corpus()
    fs = corpus.samples[0].fs

    # reference angles of the sources (relative to the microphone array)
    azimuth_ref = np.array(
        np.radians(
            [353, 98, 173, 278]
        )
    )
    distance = 3 # meters

    # ALGORITHM parameters
    nfft = 256 # FFT size
    # compute the noise variance
    sigma2 = 10**(-SNR / 10) / (4. * np.pi * distance)**2

    # TODO: tweak `freq_bins` / `freq_range` parameter for better performance
    freq_bins = np.arange(5, 100)  # FFT bins to use for DOA estimation
    freq_range = [500, 3500] # frequency range to use for DOA estimation

    # Room without reflections and absorptions
    # TODO: try room with reverberation time of 0.1s
    room = create_room(ROOM_DIMENSIONS, fs, sigma2=sigma2)

    # Get the first four speech samples
    samples = corpus.samples[:4]

    mic_array = pra.circular_2D_array(center=ROOM_DIMENSIONS / 2, M=4, phi0=0, radius=DISTANCE)

    room.add_microphone_array(pra.MicrophoneArray(mic_array, room.fs))

    # TODO: get ultrasound samples and use them as raw signal for simulated sources
    # Add sources with samples that are exactly one second long (from real speech samples)
    num_samples = int(fs)

    # add the sources to specified locations with respect to the array
    for i, angle in enumerate(azimuth_ref):
        source_location = ROOM_DIMENSIONS / 2 + distance * np.r_[np.cos(angle), np.sin(angle)]
        source_signal = samples[i].data[:num_samples] # trim to one second of sample
        room.add_source(source_location, signal=source_signal)

    # simulate the mic raw input signals convolved with the real impulse responses (RIRs)
    room.simulate(snr=SNR)

    # generate the STFT frames for overlapping frames
    X = np.array(
        [pra.stft(signal, nfft, nfft // 2, transform=np.fft.rfft).T for signal in room.mic_array.signals]
    )

    # TODO: evaluate more DOA algorithms
    # Available algorithms: MUSIC, SRP, CSSM, WAVES, TOPS, FRIDA
    #algo_names = sorted(pra.doa.algorithms.keys())
    algo_names = ["SRP", "MUSIC", "WAVES"]

    # run the algorithms and plot the results
    for algo_name in algo_names:
        # the max_four parameter is necessary for FRIDA only
        doa = pra.doa.algorithms[algo_name](mic_array, fs, nfft, c=c, num_src=len(azimuth_ref) , max_four=4)

        # this call here perform localization on the frames in X
        # FIXME: CSSM triggers here "divide by zero" runtime warning
        doa.locate_sources(X, freq_bins=freq_bins)

        # FIXME: here CSSM triggers ValueError "zero-size array to reduction operation minimum which has no identity"
        doa.polar_plt_dirac(azimuth_ref=azimuth_ref)
        plt.title(algo_name)

        # doa.azimuth_recon contains the reconstructed location of the source
        print(algo_name)
        print("\tRecovered azimuth:", np.sort(np.degrees(doa.azimuth_recon)), "degrees")
        print("\tReal azimuth:", np.sort(np.degrees(azimuth_ref)), "degrees")

    plt.show()
L0g4n commented 4 years ago

Ok, so I found out the above error that WAVES gave me disappeared when I changed the microphone array to be of size 6 instead of 4. Does that mean that the other algorithms pose some constraints on the (circular) microphone array size?

L0g4n commented 4 years ago

BTW: To be able to use the DOA algorithms for real world data that is sampled every 50ms from a microphone array for example, the number of sources (you need to pass to the DOA algorithms) is of course not known in advance. Does that mean that i have either to restrict myself by hardcoding the number of sources to compute the locations for in advance or there is a separate source counting algorithm before the whole process needed @fakufaku (e.g. that returns the number of sources that is passed to the algorithms as a paramter)?

fakufaku commented 4 years ago

If I am not mistaken, you are trying to recover 4 sources. I believe that WAVES, as well as MUSIC, have the limitation that they can only recover up to n_mics - 1 sources.

You mentioned you did not have that problem with MUSIC, that is a little bit surprising...

As to your other question regarding the number of sources, there are indeed a few ways to estimate it, however, I am not very familiar with them. You may try to look up Akaike Information Criterion for example. In practice, another possibility is to try to estimate as many sources as possible and then work out the ones that are real by comparing their power.

L0g4n commented 4 years ago

Interesting, I did not know that WAVES & MUSIC have this limitation, I need to further investigate about this aspect and see if i can reproduce this limitation. Do only these two have this limitation or are the other algorithms also affected?

Thanks, I will try to look into the Information Criterion thing, about the other method: So, basically the approach for the power method is to compute the (normalized) power of all signals and then just assume the ones are real that are above a certain threshold, or what else should be the criterion to dectect where the power value seems reasonable?

fakufaku commented 4 years ago

All algorithms that are based on subspace decomposition have it. These algorithms compute the spatial covariance matrix that is n_mics x n_mics in dimension. Then, they use an eigenvalue decomposition to extract signal and noise subspaces. The dimensions of the two subspaces need to add to n_mics, and at least one dimension is needed for the noise. In addition, the signal subspace is expected to have the same number of dimensions as there are sources. Putting all this together means that you can have at most n_mics - 1 sources. There should be a check in the DOA constructor. I will rename the issue and try to add this soon.

L0g4n commented 4 years ago

Ok, thanks for that information. BTW: Is this desired behaviour of the FRIDA algorithm not to round the reconstructed DOA to the nearest integer (since all other algorithms do exactly that). Here, the results for two (ultrasound sources, both noisefree sines waves at 20, 21kHz, respectively), freq range of interest from 20-24kHz (inter-microphone spacing of 3cm):

CSSM Recovered azimuth: [ 30. 120.] degrees Real azimuth: [ 30. 120.] degrees FRIDA Recovered azimuth: [ 29.56244176 120.00692398] degrees Real azimuth: [ 30. 120.] degrees MUSIC Recovered azimuth: [ 30. 120.] degrees Real azimuth: [ 30. 120.] degrees SRP Recovered azimuth: [ 31. 100.] degrees Real azimuth: [ 30. 120.] degrees TOPS Recovered azimuth: [254. 347.] degrees Real azimuth: [ 30. 120.] degrees WAVES Recovered azimuth: [ 30. 120.] degrees Real azimuth: [ 30. 120.] degrees

It is also interesting to note that for naive ultrasonic sources all the subspace decomposition algorithms seem to do much better than for example SRP & TOPS, with speech samples SRP-PHAT gave me the best results.

For FRIDA I also get always a lot of "ill-conditioned matrix, results may not be accurate" warnings:

C:\tools\Anaconda3\lib\site-packages\pyroomacoustics\doa\tools_fri_doa_plane.py:855: LinAlgWarning: Ill-conditioned matrix (rcond=8.85896e-23): result may not be accurate. c_ri_half = linalg.solve(mtx_loop, rhs, check_finite=False)[:sz_coef] C:\tools\Anaconda3\lib\site-packages\pyroomacoustics\doa\tools_fri_doa_plane.py:855: LinAlgWarning: Ill-conditioned matrix (rcond=8.85898e-23): result may not be accurate. c_ri_half = linalg.solve(mtx_loop, rhs, check_finite=False)[:sz_coef

fakufaku commented 4 years ago

Yes, this is the expected behavior. Except for FRIDA, all algorithms rely on search. You provide a list of possible locations and they tell you which one is most likely one. Since you provide only integer candidate locations, the result is an integer. There is no rounding. On the other hand, FRIDA computes the locations directly without search, which results in a decimal number generally.

The ill-conditioned matrix is a known warning for FRIDA.