LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.35k stars 419 forks source link

Clarification regarding time delay and amplitude for recivers for simple mic setup #261

Closed drydenwiebe closed 1 year ago

drydenwiebe commented 2 years ago

Hello, and thank you for your response to my previous question!

I am trying to reconstruct the audio sample at the source with a simple mic array that surrounds the source. I am not quite sure that what I am doing is theoretically possible. The goal is to reconstruct the sound that is emitted from the source from the revived amplitudes. This leads me to two questions about pyroomacoustics and audio simulation in general.

The setup I have is a simple square room with one sound source in the centre of the room with microphones surrounding the source evenly spaced in a circle around the source of radius R. The room has no echo. An example of this setup is in the image below (R = 25):

image

I have two main questions about this setup and what I am trying to achieve with it

1: how do I tell the time delay between the source and each receiver?

If the sampling rate for the audio from the source is say 16,000 samples/second, how would I determine the delay or offset between the source and (any of the) receivers at radius R, as a function of R? I assume it would be related to the speed of sound in air, but I am not sure in this simulation framework.

2: how can I make the sum of the amplitudes all of the mics at radius R equal to the sound source?

I would like to setup my mics such that if I sum all of the amplitudes of receivers together (in the circle above) then what I would get is the signal at the source.

When I try this (without ray tracing, using the sound image source model), the summed amplitudes are significantly larger than the source. I know that for the sound image model, it was going to fail as the number of receivers I chose was not principled, but I would like to know if there is a way to determine the number of mics evenly spaced at radius R such that the sum of the amplitudes received at the all of the mics is equal to the source .

image

One thing I tried

This this thesis page 67 equation (3.10) shows that the power (W_i) at a receiver is the sum of all received rays, divided by the total number of rays. So, if I set the receiver_radius=1 for room.set_ray_tracing so that each of the receiver spheres of radius 1 does not over lap like so:

image

I still have the same issue as before, where the sum of the amplitudes are not equal to the source. Further, the signals are not "aligned" and look slightly off which is probably because of the randomness of ray tracing.

image

I have compiled examples in this colab notebook for illustration.

Is any of this even theoretically possible? Any help is appreciated.

Thank you for your time!

fakufaku commented 1 year ago
  1. The delay between mic and receiver is given by time of flight + a small delay due to the filter used to construct the RIR. The speed of sound is also important, in pyroomacoustics c = pra.constants.get("c") will let you know the value used for the speed of sound. Then the delay in samples is distance(source, mic) / c * fs + 40 where the 40 is due to the filter used in the RIR generator. This is not very well documented, sorry about it!
  2. In your example, since you do not have any reverberation and the distance from the source to each microphone is the same, each microphone will record the exact same signal. Thus when you add them up, you will have a gain exactly equal to the number of microphones. In addition, the signal is attenuated by 1 / distance(source, mic) when traveling from source to microphone. In textbooks you will find an extra division by 4 pi but I have not included it here since only the relative amplitude maters usually.
drydenwiebe commented 1 year ago

I see, that makes sense. Thank you!

With regards to point number 2 I now understand how the amplitude is calculated the sound image model source image model, but what about ray tracing? Does my idea of placing mics in an equal distance equal to the receiver radius around the source make sense to ensure the sum of the amplitudes are equal to the source? Or does it work differently?

Another question I have is that with ray tracing, the signals are not the "same" (small variations between signals recorded at the a same distance from the source) when non ray traced signals would be the same. Is there a reason for this?

drydenwiebe commented 1 year ago

Just a follow up: I am not getting the same delay as that formula.

For the same setup as my original post, except with a sine wave instead of a speech signal. With the mics being 25m away and a sampling frequency of 16000 Hz.

delay = 25 / 343 * fs +40 = 40

However it is clear from the plots that this is not the case. I find that the original signal is of length 16000 (one second), the received signals are of length 17248, and the offset (the index where they correlate the most) is 1206.

image

Any ideas on how to figure this out?

drydenwiebe commented 1 year ago

Never mid I calculated the formula wrong... very embarrasing.

Thank you for all your help @fakufaku