Closed mishalydev closed 5 months ago
Hello,
Both options should be equivalent. There might be some minor differences due to the implementations (gpuRIR
does the convolution on the spectral domain) but no big changes. The main difference is that gpuRIR.simulateTrajectory
supports time-varying RIRs to simulate moving sources using the overlap-add method, but that's not needed in your case.
About the figure, it's hard to compare the signals at that scale. There is a clear scale difference but that's something that could be expected since using a spherical propagation model the direct path amplitude attenuates with the distance. Depending on the reverberation of your room, the contribution of the reflections might help to keep the total energy of the signal closer to the original one or most of it will be absorbed by the walls.
Best, David
Thank you for your detailed response @DavidDiazGuerra
I still find it kind of confusing since I think that pyroomacoustics brings a completely different filtered signal. I tried it with these settings:
self.sr = 16000
self.t60=0.2
self.room_dim = np.array([3, 4, 2])
self.pos_src= np.array([1.5, 1, 1])
self.pos_rcv= np.array([[1.5, 3, 1]])
This is the pyroomacoustics simulation -
e_absorption, max_order = pra.inverse_sabine(self.t60, self.room_dim)
room = pra.ShoeBox(self.room_dim, fs=self.sr, materials=pra.Material(e_absorption), max_order=max_order)
room.add_source(self.pos_src, signal=signal)
mic = pra.MicrophoneArray(self.pos_rcv.reshape((-1, 1)), self.sr)
room.add_microphone_array(mic)
room.simulate()
processed_signals = room.mic_array.signals[0]
And here it is with gpuRIR -
rir_length = 512 / self.sr # length of the RIR in seconds
nb_img = gpuRIR.t2n(T=rir_length, rooms_sz=self.room_dim)
beta = gpuRIR.beta_SabineEstimation(self.room_dim, self.t60)
rir = gpuRIR.simulateRIR(self.room_dim, beta, self.pos_src, self.pos_rcv, self.nb_img, Tmax=self.rir_length, fs=self.sr)
rir = torch.from_numpy(np.squeeze(rir)).to(self.device).view(1, 1, -1)
processed_signals = torch.nn.functional.conv1d(signal, rir, padding='same')
And here's the result:
I'm just trying to understand if there's any param that I can play with in gpuRIR who impacts the filtered signal without changing - t60 sr room_dim pos_src pos_rcv
My question is what makes this different?
The value of rir_length
that you're using is too short compared with t60
, you're probably truncating the RIR before it finishes.
The RIRs generated by pyroomacoustics
and gpuRIR
because gpuRIR
uses some modifications of the original ISM that pyroomacoustics
doesn't. You have more details about this in issue #25 and on the paper. With the RIRs being different, I wouldn't expect the filtered signals to be identical, though I have to admit that the amplitude difference is surprising.
The perfectly symmetrical situation that you're simulating is prone to create artifacts that might explain the difference, but it's hard to say. In any case, I would compare RIRs rather than filtered signals first to make sure if the difference comes from the RIRs or from the filtering process.
Thanks again for your responses @DavidDiazGuerra
This is the RIRs when I set max_order=2 in pyroomacoustics
(instead of 41 which was the result of
e_absorption, max_order = pra.inverse_sabine(self.t60, self.room_dim)
) :
Also, I did read this post you've mentioned but I'm not sure yet if I have something do with that. Is there any way to play with the RIR magnitude in gpuRIR?
I don't think the RIRs should be so different, I would expect the peaks to be at least well synchronized and have similar magnitude, even if some of them might be negative in the case of gpuRIR
and the ones from pyroomacoustics
might have some artifacts around them.
I don't think playing with max_order
inpyroomacoustics
or nb_img
in gpuRIR
is a good idea since a too-low value might make some peaks disappear if they correspond to higher-order images.
The amplitudes of the peaks in the image source method (ISM) are determined by the distances and the absorption coefficients, but the direct path should be only affected by the distance between the source and the receiver. The amplitude of the direct path in gpuRIR
is just the result of applying a spherical propagation model A_dp=1/(4*pi*d)
as it's usually done in the ISM. I don't know if pyroomacoustics
adds any kind of normalization that could affect the magnitude of the peaks.
I think that this is the fixed comparison
So just to verify, I'm not sure I understand. Is that make sense that both libraries give so different RIR scale? Isn't it neccesrally a mistake?
I've been trying to replicate your code and, after fixing the shape of pos_src
and ensuring that we were simulating all the image sources that contribute to the first 200 ms of the RIR, this is what I've got:
import numpy as np
import matplotlib.pyplot as plt
import gpuRIR
import pyroomacoustics as pra
sr = 16000
t60=0.2
room_dim = np.array([3, 4, 2])
pos_src= np.array([[1.5, 1, 1]])
pos_rcv= np.array([[1.5, 3, 1]])
e_absorption, max_order = pra.inverse_sabine(t60, room_dim)
room = pra.ShoeBox(room_dim, fs=sr, materials=pra.Material(e_absorption), max_order=max_order)
room.add_source(pos_src.T)
mic = pra.MicrophoneArray(pos_rcv.T, sr)
room.add_microphone_array(mic)
room.compute_rir()
rir_length = t60 # length of the RIR in seconds
nb_img = gpuRIR.t2n(T=rir_length, rooms_sz=room_dim)
beta = gpuRIR.beta_SabineEstimation(room_dim, t60)
rir = gpuRIR.simulateRIR(room_dim, beta, pos_src, pos_rcv, nb_img, Tmax=rir_length, fs=sr)
They look indeed quite different but, after fixing the lag and scale, they become much more similar:
lag = np.argmax(np.abs(room.rir[0][0])) - np.argmax(np.abs(rir[0,0,:]))
scale = np.max(np.abs(rir[0,0,:])) / np.max(np.abs(room.rir[0][0]))
plt.plot(scale * room.rir[0][0][lag:512+lag])
plt.plot(rir[0,0,:512])
plt.legend(['pra', 'gpuRIR'])
plt.show()
With the lag and scale fixed all the peaks are perfectly synchronized and their magnitudes are the same in most cases (I would bet that the cases where they aren't are due to artifacts due to be simulating a perfectly symmetric scenario). Therefore, I would say that the results of both libraries are equivalent except for the lag and the scale:
About the lag, for a distance of d = 2
meters and a speed of sound c = 343
m/s, the direct-path peak of the RIR should be at the sample (d/c) * sr = 93.29446064139941
as in the result of gpuRIR
. I don't know why inpyroomacoustics
it has an extra 40 samples lag.
About the scale, if we follow a spherical propagation model, the amplitude of the peak should be 1/(4*np.pi*d) = 0.039788735772973836
as in gpuRIR
. I don't know why in pyroomacoustics
it's multiplied by a factor 12.47817159323428.
To sum up, the RIR generated by gpuRIR is what you could expect from the image source method with negative reflection coefficients and its peaks match the results of other libraries (such as Python's rirgen
and rir-generator
or Lehmann's Matlab library) in time and magnitude. I don't know where the differences with pyroomacoustics
come from, but you should probably ask to their developpers.
@mishalydev @DavidDiazGuerra Hello! These kind of issues have come up multiple times. I should really make a section for this in the doc... 🦥 reference
4 * pi
. It does not make a lot of sense in the context of audio signals only have a relative scale anyway. If necessary, this can be uniformly fixed by scaling the rir or convolved signal by 4 * np.pi
.40 / fs * c
(80 cm at 16 kHz), the rir can be truncated to make it time exact. If the source and receiver are closer than that, the tail of the low pass filter will be truncated, which may lead to some artifacts.Hope this helps clarify things
Thanks for the explanations @fakufaku!
I agree that the scale factor doesn't have a big (or any) impact on this kind of acoustic simulation. I should have realized 12.47817 was 4pi. I guess that the main advantage of including that normalization factor is that it defines the amplitude of the source signal as the amplitude at 1 matter of the source, but I don't think nobody really cares about that.
I also understand the reasons for the delay of 40 samples. It might be misleading in some applications if the user is not aware of it, but it is also true that not including it might indeed generate artifacts for receivers too close to the source.
Best, David
@DavidDiazGuerra @fakufaku Thank you both for your time and tolerance.
Just to sum up, if I got it right - the only thing I need to do for pyroomacoustics
RIR to align the RIRs is:
rir = room.rir[0][0]
rir_ism = rir[40:40+self.rir_length] * (1/(torch.pi * 4))
It also does make sense with the visualization:
Assume that I already calculated the RIR using gpuRIR.simulateRIR and I have a signal_batch. The RIR size is [1, 1, 512] And the signal_batch size is [1, 1, 160000]
I want to apply this RIR on the signal_batch. How should I do that? a) torch.nn.functional.conv1d(signal_batch, RIR) b) gpuRIR.simulateTrajectory(signal_batch, RIR)
Until know I used the first option but then I noticed that it results in a huge difference between the original signal and the filtered signal:
Does that seem normal?
I'd really appreciate any response.
Thanks