LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.35k stars 419 forks source link

Advise for inter-microphone distance when wavelength is too small #142

Closed L0g4n closed 4 years ago

L0g4n commented 4 years ago

Do you happen to know how to choose the distance between the microphones in the array when the signals of interest have a relatively small wavelength? For example, a 22kHz signal would have an approximate wavelength of 1.6cm (343 / 22kHz) and since it is generally advised to space the microphones with a distance of less than the wavelength of the signal of interest this would mean a distance between each microphone of 0.8cm.

However, this distance is realistically not feasible when for example the physical microphones alone have a diameter of 2.5cm which implies that the minimum distance between the same microphones is 1.25cm (the radius) plus some offset that the microphones do not influence other when the membrane oscillates. So, is there a way to still get good results from the DOA algorithms when the inter-microphone distance is not ideal?

fakufaku commented 4 years ago

Hi, unfortunately, I don't think there are any good solutions to this problem. What happens when the distance between microphones is too large is spatial aliasing. Normally, most array signal processing algorithms rely on sources at different locations having different phase difference between the microphones. When the inter mic distance is more than half a wavelength, the phase difference is no longer unique. For example, let's assume that the distance between the tow microphones is exactly one period. Then a source place on the line formed by the two microphones will have zero phase difference. However, a source place on the line perpendicular to the line formed by the microphones will also have zero phase difference. Hence, the two angles cannot be resolved.

If your signal is wide-band, aliased frequencies are still helpful because they reduce the number of possible locations to a handful, and lower frequencies might be enough to disambiguate.

Another solution is to use MEMS microphones which are just a few millimeters in size.

L0g4n commented 4 years ago

Yes, that is reasonable to assume. However, is it not possible to "exclude" false angles by observing the relevant information at more than 2 microphones?

fakufaku commented 4 years ago

Hum, I don't have a ready answer for that, I'll try to think about it. But I don't believe it solves the problem. Why not just write a script to test your idea ? Place the microphones and create a source with the desired frequency (a pure sine works), then run DOA and observe the polar plot produced, for example by SRP or MUSIC. If aliasing is happening, you will see peaks with similar magnitudes at several locations.

L0g4n commented 4 years ago

Yes, that's what I plan to do. I already tried that with 20-23 kHz signals but what I really need are 38-40 kHz signals which I can not artifically generate on my computer (the signals then always contain NaNs as individual samples, it seems my sound card is not capable of that).

fakufaku commented 4 years ago

When you say artificially generate, do you mean with pyroomacoustics ? Where are the NaN coming from ? Also, if you have a regular sound card, it is most likely not able to output sound at such frequencies. Also regular speakers are likely not to output any sound at such frequencies. You would need to use ultrasound speakers.

L0g4n commented 4 years ago

When you say artificially generate, do you mean with pyroomacoustics ? Where are the NaN coming from ?

No, I simply computed the individual samples in python with the sine formula by passing the aforementioned frequencies and when I inspected the array of samples I (in combination with pyroomacoustics) saw only NaNs as sample values.

FREQUENCY = 40e3
SAMPLING_RATE = FREQUENCY * 2

def main():
    # amplitude is half of (2^15 - 1) = 32_767, since WAV assigns to every sample 16 bit audio
    # we multiply the sample with this value to get HALF-SCALE AUDIO, i.e. half as loud as full scale
    create_sine_wave(int(FREQUENCY), int(SAMPLING_RATE), SAMPLING_RATE, int(16e3))

def create_sine_wave(frequency, num_samples, sampling_rate, amplitude):
    # formula of sine is y(t) = A * sin(2 * pi * f * t), where A is amplitude, f is frequency, t is the sample (divide by sample rate to make it digital)
    sine_wave = [np.sin(2 * np.pi * frequency * i/sampling_rate) for i in range(num_samples)]
    noise_freq = 0
    noise_wave = np.array([np.sin(2 * np.pi * noise_freq * i/sampling_rate) for i in range(num_samples)])

    noisy_wave = sine_wave + noise_wave

Pardon my ignoreance, but: Where do the NaNs exactly come from? The whole calculation runs solo about the CPU so the constraints of a sound card should not apply. Is it because the generated sample values are too high (and python makes something weird with that)? BTW: I did not know that pyroomacoustics can generate signals? Such kinds like for example audacity offers?

Of course I know that playing ultrasound over speakers does not make sense since humans can not hear it, but in my case I am only interested in having such ultrasonic signals at hand.

fakufaku commented 4 years ago

I tried your code, for me it doesn't give NaN, but the output signal is equally zero everywhere (or some very small number, e.g. 1e-11).

The reason is that you pick frequency = 2 * sampling_rate, precisely. In this case, you can compute that 2 * pi * frequency * k / sampling_rate = k * pi, exactly. And, we sin(k * pi) = 0 for k integer. Note that if you use cos instead, you will get cos(k * pi) = +/- 1, which might also not be what you want.

In fact, the Nyquist-Shannon sampling theorem states that you should pick a frequency that is strictly less than half the sampling frequency. So pick maybe frequency = 0.9 * 0.5 * sampling_frequency, or the like.

L0g4n commented 4 years ago

To my knowledge the sampling theorem just states how to pick a sampling rate depending on the highest frequency of interest, i.e. you need to pick a sampling rate twice the highest frequency to be able to recreate the continuous sine wave. But I never heard that it influences the frequency as well.

fakufaku commented 4 years ago

The sampling theorem states the condition for perfect reconstruction of continuous band-limited signals from samples. This condition is that the sampling be frequency be strictly larger than twice the largest frequency in the original signal.

If you sample at exactly twice the maximum frequency, there is in fact ambiguity in the signal reconstruction, that is you can find two different signals with the same samples. I cooked a quick example to try to convince you of that. The following script creates two sinusoids with frequency f = 1 Hz that when sampled at fs = 2 Hz produce identical samples. You can check this visually in the figure produces.

import numpy as np
import matplotlib.pyplot as plt

if __name__ == "__main__":

    # discretize the continuous axis
    f_continuous = 1000
    T = 3
    t = np.arange(0, T + 1.0 / f_continuous, 1.0 / f_continuous)  # time vector

    # pick a frequency for our signal
    f = 1

    # Create two cosine signals
    x1 = np.cos(2.0 * np.pi * f * t)
    x2 = np.cos(2.0 * np.pi * f * t + np.pi / 4.0) / np.cos(np.pi / 4.0)

    # Now pick the sampling frequency
    # here we need to choose value so that sampling period is integer
    fs = 2 * f
    Ts = int(f_continuous * f / fs)

    # Sample the "continuous" signal
    td = t[::Ts]
    d1 = x1[::Ts]
    d2 = x2[::Ts]

    # plot the result
    plt.plot(t, x1, label="$\cos(2 \pi f t)$")
    plt.plot(t, x2, label="$\cos(2 \pi f t + \pi / 4) / \cos(\pi / 4)$")
    plt.stem(
        td, d1, label=f"Sampled at {fs} Hz", linefmt="g-", markerfmt="go", basefmt="g2-"
    )
    plt.legend(loc="lower right")
    plt.title(f"Signal frequency {f} Hz, sampling frequency {fs} Hz")
    plt.xlabel("Time [s]")
    plt.savefig("sampling_theorem_ambiguity.png", dpi=300)
    plt.show()

L0g4n commented 4 years ago

You are right, somehow from my university courses only 2 * f_max stuck in my head, but not that the sampling rate has to be strictly larger (that also explains why a common sampling rate is 44,1 kHz for human voice up to 20kHz). I will try to make these changes and report back.

L0g4n commented 4 years ago

What did not disclose up to this point is that I am not dealing with baseband signals but with band-pass filtered signals in the frequency range 35 - 40 kHz so effectively my bandwidth is 5 kHz which means by carrying out sub sampling with a sampling rate of greater than 10kHz (e.g. 2.2 * 5kHz) I should be able to reconstruct my signals withouth aliasing effects, see sampling of non-baseband signals. What remains to be seen if that imposes relaxed constraints on my array geometry (regarding the inter-microphone distance) or if that does not matter at all. @fakufaku

fakufaku commented 4 years ago

I see. Indeed, to avoid time domain aliasing, you would just need <10 kHz. For spatial aliasing, I need to think a little more, but my feeling is that this sadly doesn't help. The problem is that due to spatial aliasing, the signal coming from different directions, is undistinguishable at the microphones, that is, it arrives with the same relative delays. This means that whatever post-processing you do on your signals, you won't be able to tell the two directions apart.

L0g4n commented 4 years ago

@fakufaku Well, I will see about that. I am performing this week an experiment that evaluates the DOA performance using a physical microphone array trying to localize such band-limited ultrasonic signals. My simulation using pyroomacoustics shows that I am able to accurately localize two active sources (ultrasonic, subsampled signals at 11.5 kHz) using the MUSIC algorithm with a microphone array of size 4 (with 3 sources also MUSIC breaks down). All other algorithms seem to perform much worse, only MUSIC consistently delivers good results (max. 5 degree error with a few outliers) (FRIDA seems to crash all the time). Additonally, CSSM frequently crashes with "numpy.linalg.LinAlgError: Singular matrix" error.

L0g4n commented 4 years ago

@fakufaku One question: Is the origin automatically calcluated from the microphone array and this is the reference point to which the azimuth angles refer to?

fakufaku commented 4 years ago

I have quickly checked, but couldn't find a definitive answer. I think for algorithms that rely on time difference of arrivals (MUSIC, SRP-PHAT), the placement of the origin should not make a difference. In that case, the DOA will be computed with respect to the frame of the room.

L0g4n commented 4 years ago

Err, what I really meant was the center of my array that is equivalent to the origin of the coordinate system in my case.

fakufaku commented 4 years ago

Yes, I understand. I could not find code that modifies the coordinates that you provide to the DOA object. However, I think that for SRP and MUSIC (at least), the placement of the origin doesn't matter, because they only consider differences of microphone coordinates.

L0g4n commented 4 years ago

@fakufaku Looks promising (Real DOA localization in a little room, i.e. reflections, ca. 1.5m apart from the array, Ultrasonic Sender at 40 kHz, sampled at 11.5 kHz): DOA