Closed L0g4n closed 4 years ago
Hi, unfortunately, I don't think there are any good solutions to this problem. What happens when the distance between microphones is too large is spatial aliasing. Normally, most array signal processing algorithms rely on sources at different locations having different phase difference between the microphones. When the inter mic distance is more than half a wavelength, the phase difference is no longer unique. For example, let's assume that the distance between the tow microphones is exactly one period. Then a source place on the line formed by the two microphones will have zero phase difference. However, a source place on the line perpendicular to the line formed by the microphones will also have zero phase difference. Hence, the two angles cannot be resolved.
If your signal is wide-band, aliased frequencies are still helpful because they reduce the number of possible locations to a handful, and lower frequencies might be enough to disambiguate.
Another solution is to use MEMS microphones which are just a few millimeters in size.
Yes, that is reasonable to assume. However, is it not possible to "exclude" false angles by observing the relevant information at more than 2 microphones?
Hum, I don't have a ready answer for that, I'll try to think about it. But I don't believe it solves the problem. Why not just write a script to test your idea ? Place the microphones and create a source with the desired frequency (a pure sine works), then run DOA and observe the polar plot produced, for example by SRP or MUSIC. If aliasing is happening, you will see peaks with similar magnitudes at several locations.
Yes, that's what I plan to do. I already tried that with 20-23 kHz signals but what I really need are 38-40 kHz signals which I can not artifically generate on my computer (the signals then always contain NaNs as individual samples, it seems my sound card is not capable of that).
When you say artificially generate, do you mean with pyroomacoustics
? Where are the NaN coming from ?
Also, if you have a regular sound card, it is most likely not able to output sound at such frequencies. Also regular speakers are likely not to output any sound at such frequencies. You would need to use ultrasound speakers.
When you say artificially generate, do you mean with pyroomacoustics ? Where are the NaN coming from ?
No, I simply computed the individual samples in python with the sine formula by passing the aforementioned frequencies and when I inspected the array of samples I (in combination with pyroomacoustics
) saw only NaN
s as sample values.
FREQUENCY = 40e3
SAMPLING_RATE = FREQUENCY * 2
def main():
# amplitude is half of (2^15 - 1) = 32_767, since WAV assigns to every sample 16 bit audio
# we multiply the sample with this value to get HALF-SCALE AUDIO, i.e. half as loud as full scale
create_sine_wave(int(FREQUENCY), int(SAMPLING_RATE), SAMPLING_RATE, int(16e3))
def create_sine_wave(frequency, num_samples, sampling_rate, amplitude):
# formula of sine is y(t) = A * sin(2 * pi * f * t), where A is amplitude, f is frequency, t is the sample (divide by sample rate to make it digital)
sine_wave = [np.sin(2 * np.pi * frequency * i/sampling_rate) for i in range(num_samples)]
noise_freq = 0
noise_wave = np.array([np.sin(2 * np.pi * noise_freq * i/sampling_rate) for i in range(num_samples)])
noisy_wave = sine_wave + noise_wave
Pardon my ignoreance, but: Where do the NaN
s exactly come from? The whole calculation runs solo about the CPU so the constraints of a sound card should not apply. Is it because the generated sample values are too high (and python makes something weird with that)?
BTW: I did not know that pyroomacoustics
can generate signals? Such kinds like for example audacity offers?
Of course I know that playing ultrasound over speakers does not make sense since humans can not hear it, but in my case I am only interested in having such ultrasonic signals at hand.
I tried your code, for me it doesn't give NaN, but the output signal is equally zero everywhere (or some very small number, e.g. 1e-11).
The reason is that you pick frequency = 2 * sampling_rate
, precisely. In this case, you can compute that 2 * pi * frequency * k / sampling_rate = k * pi
, exactly. And, we sin(k * pi) = 0
for k
integer. Note that if you use cos
instead, you will get cos(k * pi) = +/- 1
, which might also not be what you want.
In fact, the Nyquist-Shannon sampling theorem states that you should pick a frequency that is strictly less than half the sampling frequency. So pick maybe frequency = 0.9 * 0.5 * sampling_frequency
, or the like.
To my knowledge the sampling theorem just states how to pick a sampling rate depending on the highest frequency of interest, i.e. you need to pick a sampling rate twice the highest frequency to be able to recreate the continuous sine wave. But I never heard that it influences the frequency as well.
The sampling theorem states the condition for perfect reconstruction of continuous band-limited signals from samples. This condition is that the sampling be frequency be strictly larger than twice the largest frequency in the original signal.
If you sample at exactly twice the maximum frequency, there is in fact ambiguity in the signal reconstruction, that is you can find two different signals with the same samples. I cooked a quick example to try to convince you of that.
The following script creates two sinusoids with frequency f = 1 Hz
that when sampled at fs = 2 Hz
produce identical samples. You can check this visually in the figure produces.
import numpy as np
import matplotlib.pyplot as plt
if __name__ == "__main__":
# discretize the continuous axis
f_continuous = 1000
T = 3
t = np.arange(0, T + 1.0 / f_continuous, 1.0 / f_continuous) # time vector
# pick a frequency for our signal
f = 1
# Create two cosine signals
x1 = np.cos(2.0 * np.pi * f * t)
x2 = np.cos(2.0 * np.pi * f * t + np.pi / 4.0) / np.cos(np.pi / 4.0)
# Now pick the sampling frequency
# here we need to choose value so that sampling period is integer
fs = 2 * f
Ts = int(f_continuous * f / fs)
# Sample the "continuous" signal
td = t[::Ts]
d1 = x1[::Ts]
d2 = x2[::Ts]
# plot the result
plt.plot(t, x1, label="$\cos(2 \pi f t)$")
plt.plot(t, x2, label="$\cos(2 \pi f t + \pi / 4) / \cos(\pi / 4)$")
plt.stem(
td, d1, label=f"Sampled at {fs} Hz", linefmt="g-", markerfmt="go", basefmt="g2-"
)
plt.legend(loc="lower right")
plt.title(f"Signal frequency {f} Hz, sampling frequency {fs} Hz")
plt.xlabel("Time [s]")
plt.savefig("sampling_theorem_ambiguity.png", dpi=300)
plt.show()
You are right, somehow from my university courses only 2 * f_max
stuck in my head, but not that the sampling rate has to be strictly larger (that also explains why a common sampling rate is 44,1 kHz for human voice up to 20kHz). I will try to make these changes and report back.
What did not disclose up to this point is that I am not dealing with baseband signals but with band-pass filtered signals in the frequency range 35 - 40 kHz
so effectively my bandwidth is 5 kHz
which means by carrying out sub sampling with a sampling rate of greater than 10kHz
(e.g. 2.2 * 5kHz
) I should be able to reconstruct my signals withouth aliasing effects, see sampling of non-baseband signals. What remains to be seen if that imposes relaxed constraints on my array geometry (regarding the inter-microphone distance) or if that does not matter at all. @fakufaku
I see. Indeed, to avoid time domain aliasing, you would just need <10 kHz. For spatial aliasing, I need to think a little more, but my feeling is that this sadly doesn't help. The problem is that due to spatial aliasing, the signal coming from different directions, is undistinguishable at the microphones, that is, it arrives with the same relative delays. This means that whatever post-processing you do on your signals, you won't be able to tell the two directions apart.
@fakufaku Well, I will see about that. I am performing this week an experiment that evaluates the DOA performance using a physical microphone array trying to localize such band-limited ultrasonic signals.
My simulation using pyroomacoustics
shows that I am able to accurately localize two active sources (ultrasonic, subsampled signals at 11.5 kHz) using the MUSIC algorithm with a microphone array of size 4 (with 3 sources also MUSIC breaks down). All other algorithms seem to perform much worse, only MUSIC consistently delivers good results (max. 5 degree error with a few outliers) (FRIDA seems to crash all the time). Additonally, CSSM frequently crashes with "numpy.linalg.LinAlgError: Singular matrix" error.
@fakufaku One question: Is the origin automatically calcluated from the microphone array and this is the reference point to which the azimuth angles refer to?
I have quickly checked, but couldn't find a definitive answer. I think for algorithms that rely on time difference of arrivals (MUSIC, SRP-PHAT), the placement of the origin should not make a difference. In that case, the DOA will be computed with respect to the frame of the room.
Err, what I really meant was the center of my array that is equivalent to the origin of the coordinate system in my case.
Yes, I understand. I could not find code that modifies the coordinates that you provide to the DOA
object. However, I think that for SRP and MUSIC (at least), the placement of the origin doesn't matter, because they only consider differences of microphone coordinates.
@fakufaku Looks promising (Real DOA localization in a little room, i.e. reflections, ca. 1.5m apart from the array, Ultrasonic Sender at 40 kHz, sampled at 11.5 kHz):
Do you happen to know how to choose the distance between the microphones in the array when the signals of interest have a relatively small wavelength? For example, a 22kHz signal would have an approximate wavelength of 1.6cm (343 / 22kHz) and since it is generally advised to space the microphones with a distance of less than the wavelength of the signal of interest this would mean a distance between each microphone of 0.8cm.
However, this distance is realistically not feasible when for example the physical microphones alone have a diameter of 2.5cm which implies that the minimum distance between the same microphones is 1.25cm (the radius) plus some offset that the microphones do not influence other when the membrane oscillates. So, is there a way to still get good results from the DOA algorithms when the inter-microphone distance is not ideal?