Differences with pyroomacoustics

mishalydev commented 5 months ago

Assume that I already calculated the RIR using gpuRIR.simulateRIR and I have a signal_batch. The RIR size is [1, 1, 512] And the signal_batch size is [1, 1, 160000]

I want to apply this RIR on the signal_batch. How should I do that? a) torch.nn.functional.conv1d(signal_batch, RIR) b) gpuRIR.simulateTrajectory(signal_batch, RIR)

Until know I used the first option but then I noticed that it results in a huge difference between the original signal and the filtered signal:

Does that seem normal?

I'd really appreciate any response.

Thanks

DavidDiazGuerra commented 5 months ago

Hello,

Both options should be equivalent. There might be some minor differences due to the implementations (gpuRIR does the convolution on the spectral domain) but no big changes. The main difference is that gpuRIR.simulateTrajectory supports time-varying RIRs to simulate moving sources using the overlap-add method, but that's not needed in your case.

About the figure, it's hard to compare the signals at that scale. There is a clear scale difference but that's something that could be expected since using a spherical propagation model the direct path amplitude attenuates with the distance. Depending on the reverberation of your room, the contribution of the reflections might help to keep the total energy of the signal closer to the original one or most of it will be absorbed by the walls.

Best, David

mishalydev commented 5 months ago

Thank you for your detailed response @DavidDiazGuerra

I still find it kind of confusing since I think that pyroomacoustics brings a completely different filtered signal. I tried it with these settings:

self.sr = 16000
self.t60=0.2
self.room_dim = np.array([3, 4, 2])
self.pos_src= np.array([1.5, 1, 1])
self.pos_rcv= np.array([[1.5, 3, 1]])

This is the pyroomacoustics simulation -

e_absorption, max_order = pra.inverse_sabine(self.t60, self.room_dim)
room = pra.ShoeBox(self.room_dim, fs=self.sr,  materials=pra.Material(e_absorption), max_order=max_order)
room.add_source(self.pos_src, signal=signal)
mic = pra.MicrophoneArray(self.pos_rcv.reshape((-1, 1)), self.sr)
room.add_microphone_array(mic)

room.simulate()
processed_signals = room.mic_array.signals[0]

And here it is with gpuRIR -

rir_length = 512 / self.sr # length of the RIR in seconds
nb_img = gpuRIR.t2n(T=rir_length, rooms_sz=self.room_dim)

beta = gpuRIR.beta_SabineEstimation(self.room_dim, self.t60)
rir = gpuRIR.simulateRIR(self.room_dim, beta, self.pos_src, self.pos_rcv, self.nb_img, Tmax=self.rir_length, fs=self.sr)
rir = torch.from_numpy(np.squeeze(rir)).to(self.device).view(1, 1, -1)
processed_signals = torch.nn.functional.conv1d(signal, rir, padding='same')

And here's the result:

I'm just trying to understand if there's any param that I can play with in gpuRIR who impacts the filtered signal without changing - t60 sr room_dim pos_src pos_rcv

My question is what makes this different?

DavidDiazGuerra commented 5 months ago

The value of rir_length that you're using is too short compared with t60, you're probably truncating the RIR before it finishes.

The RIRs generated by pyroomacoustics and gpuRIR because gpuRIR uses some modifications of the original ISM that pyroomacoustics doesn't. You have more details about this in issue #25 and on the paper. With the RIRs being different, I wouldn't expect the filtered signals to be identical, though I have to admit that the amplitude difference is surprising.

The perfectly symmetrical situation that you're simulating is prone to create artifacts that might explain the difference, but it's hard to say. In any case, I would compare RIRs rather than filtered signals first to make sure if the difference comes from the RIRs or from the filtering process.

mishalydev commented 5 months ago

Thanks again for your responses @DavidDiazGuerra

This is the RIRs when I set max_order=2 in pyroomacoustics (instead of 41 which was the result of e_absorption, max_order = pra.inverse_sabine(self.t60, self.room_dim)) :

Also, I did read this post you've mentioned but I'm not sure yet if I have something do with that. Is there any way to play with the RIR magnitude in gpuRIR?

DavidDiazGuerra commented 5 months ago

I don't think the RIRs should be so different, I would expect the peaks to be at least well synchronized and have similar magnitude, even if some of them might be negative in the case of gpuRIR and the ones from pyroomacoustics might have some artifacts around them.

I don't think playing with max_order inpyroomacoustics or nb_img in gpuRIR is a good idea since a too-low value might make some peaks disappear if they correspond to higher-order images.

The amplitudes of the peaks in the image source method (ISM) are determined by the distances and the absorption coefficients, but the direct path should be only affected by the distance between the source and the receiver. The amplitude of the direct path in gpuRIR is just the result of applying a spherical propagation model A_dp=1/(4*pi*d) as it's usually done in the ISM. I don't know if pyroomacoustics adds any kind of normalization that could affect the magnitude of the peaks.

mishalydev commented 5 months ago

I think that this is the fixed comparison

So just to verify, I'm not sure I understand. Is that make sense that both libraries give so different RIR scale? Isn't it neccesrally a mistake?

DavidDiazGuerra commented 5 months ago

I've been trying to replicate your code and, after fixing the shape of pos_src and ensuring that we were simulating all the image sources that contribute to the first 200 ms of the RIR, this is what I've got:

import numpy as np
import matplotlib.pyplot as plt
import gpuRIR
import pyroomacoustics as pra

sr = 16000
t60=0.2
room_dim = np.array([3, 4, 2])
pos_src= np.array([[1.5, 1, 1]])
pos_rcv= np.array([[1.5, 3, 1]])

e_absorption, max_order = pra.inverse_sabine(t60, room_dim)
room = pra.ShoeBox(room_dim, fs=sr,  materials=pra.Material(e_absorption), max_order=max_order)
room.add_source(pos_src.T)
mic = pra.MicrophoneArray(pos_rcv.T, sr)
room.add_microphone_array(mic)
room.compute_rir()

rir_length = t60  # length of the RIR in seconds
nb_img = gpuRIR.t2n(T=rir_length, rooms_sz=room_dim)
beta = gpuRIR.beta_SabineEstimation(room_dim, t60)
rir = gpuRIR.simulateRIR(room_dim, beta, pos_src, pos_rcv, nb_img, Tmax=rir_length, fs=sr)

Figure_1

They look indeed quite different but, after fixing the lag and scale, they become much more similar:

lag = np.argmax(np.abs(room.rir[0][0])) - np.argmax(np.abs(rir[0,0,:]))
scale = np.max(np.abs(rir[0,0,:])) / np.max(np.abs(room.rir[0][0]))

plt.plot(scale * room.rir[0][0][lag:512+lag])
plt.plot(rir[0,0,:512])
plt.legend(['pra', 'gpuRIR'])
plt.show()

Figure_2

With the lag and scale fixed all the peaks are perfectly synchronized and their magnitudes are the same in most cases (I would bet that the cases where they aren't are due to artifacts due to be simulating a perfectly symmetric scenario). Therefore, I would say that the results of both libraries are equivalent except for the lag and the scale:

About the lag, for a distance of d = 2 meters and a speed of sound c = 343 m/s, the direct-path peak of the RIR should be at the sample (d/c) * sr = 93.29446064139941 as in the result of gpuRIR. I don't know why inpyroomacoustics it has an extra 40 samples lag.
About the scale, if we follow a spherical propagation model, the amplitude of the peak should be 1/(4*np.pi*d) = 0.039788735772973836 as in gpuRIR. I don't know why in pyroomacoustics it's multiplied by a factor 12.47817159323428.

To sum up, the RIR generated by gpuRIR is what you could expect from the image source method with negative reflection coefficients and its peaks match the results of other libraries (such as Python's rirgen and rir-generator or Lehmann's Matlab library) in time and magnitude. I don't know where the differences with pyroomacoustics come from, but you should probably ask to their developpers.

fakufaku commented 5 months ago

@mishalydev @DavidDiazGuerra Hello! These kind of issues have come up multiple times. I should really make a section for this in the doc... 🦥 reference

Amplitude difference: pyroomacoustics does not apply the normalization by 4 * pi. It does not make a lot of sense in the context of audio signals only have a relative scale anyway. If necessary, this can be uniformly fixed by scaling the rir or convolved signal by 4 * np.pi.
Lag: the delay of 40 samples found by @DavidDiazGuerra is expected in pyroomacoustics. This is half the width of the fractional delay filter used. If the distance between the source and the receiver is longer than 40 / fs * c (80 cm at 16 kHz), the rir can be truncated to make it time exact. If the source and receiver are closer than that, the tail of the low pass filter will be truncated, which may lead to some artifacts.

Hope this helps clarify things

DavidDiazGuerra commented 5 months ago

Thanks for the explanations @fakufaku!

I agree that the scale factor doesn't have a big (or any) impact on this kind of acoustic simulation. I should have realized 12.47817 was 4pi. I guess that the main advantage of including that normalization factor is that it defines the amplitude of the source signal as the amplitude at 1 matter of the source, but I don't think nobody really cares about that.

I also understand the reasons for the delay of 40 samples. It might be misleading in some applications if the user is not aware of it, but it is also true that not including it might indeed generate artifacts for receivers too close to the source.

Best, David

mishalydev commented 5 months ago

@DavidDiazGuerra @fakufaku Thank you both for your time and tolerance. Just to sum up, if I got it right - the only thing I need to do for pyroomacoustics RIR to align the RIRs is:

rir = room.rir[0][0]
rir_ism = rir[40:40+self.rir_length] * (1/(torch.pi * 4))

It also does make sense with the visualization:

DavidDiazGuerra / gpuRIR

Differences with pyroomacoustics #61