GAMMA-UMD / pygsound

Impulse response generation based on state-of-the-art geometric sound propagation engine.
Other
140 stars 21 forks source link

Why use a random noise to reconstruct the IR phase #36

Open ndwuhuangwei opened 1 year ago

ndwuhuangwei commented 1 year ago

Hi,

When I deployed pygsound in a very simple scene (one direct path and one reflection path), I found that sometimes the amplitude of the peak representing the reflection path would exceed the amplitude of the peak representing the direct path, which is unreasonable.

After checking out the source code, I found that in ImpulseResponse::setIR() in gsImpulseResponse.cpp,a random noise in the range -1 to 1 was used to multiply the IR (impulse response). The figure below shows the comparison among the interleaved IR (named irC in source code), the noise (named noise in source code), and the final IR output (named outputC in source code) ( They all took absolute values).

image

In gsImpulseResponse.h , the comments say that the noise is used to reconstruct the phase of the pressure IR, which I can't understand. Could you please explain more about the function of the noise?

RoyJames commented 1 year ago

Your observation is correct - a random noise is used to generate phase. This is far from being ideal indeed. The real underlying issue is we don't have a good way to reconstruct the phase, so we used a simple (lazy) random solution that you see here. My personal opinion is that phase is intractable in geometric acoustics due to the highly simplified boundary conditions (i.e., only describing surfaces using real-valued reflection coefficients). It's less of a problem for wave acoustic methods, but the phase will still be inaccurate since acoustic materials are difficult to get measured accurately in the first place.

ndwuhuangwei commented 1 year ago

Your observation is correct - a random noise is used to generate phase. This is far from being ideal indeed. The real underlying issue is we don't have a good way to reconstruct the phase, so we used a simple (lazy) random solution that you see here. My personal opinion is that phase is intractable in geometric acoustics due to the highly simplified boundary conditions (i.e., only describing surfaces using real-valued reflection coefficients). It's less of a problem for wave acoustic methods, but the phase will still be inaccurate since acoustic materials are difficult to get measured accurately in the first place.

Thanks for replying.

I'm still confused about the concept of phase here. As I understand, the phase of a real-valued IR can be equivalent to the time delay in the x-axis. According to what I observed, the delay of the peaks in the simulated IR is already accurate enough without the phase reconstruction introduced by noise.

Does the phase here refer to the sign of the amplitude (positive or negative)? I found that the samples in the original IR without phase reconstruction are all positive-valued, which can not be used for convolution directly.

RoyJames commented 1 year ago

Your observation is correct - a random noise is used to generate phase. This is far from being ideal indeed. The real underlying issue is we don't have a good way to reconstruct the phase, so we used a simple (lazy) random solution that you see here. My personal opinion is that phase is intractable in geometric acoustics due to the highly simplified boundary conditions (i.e., only describing surfaces using real-valued reflection coefficients). It's less of a problem for wave acoustic methods, but the phase will still be inaccurate since acoustic materials are difficult to get measured accurately in the first place.

Thanks for replying.

I'm still confused about the concept of phase here. As I understand, the phase of a real-valued IR can be equivalent to the time delay in the x-axis. According to what I observed, the delay of the peaks in the simulated IR is already accurate enough without the phase reconstruction introduced by noise.

Does the phase here refer to the sign of the amplitude (positive or negative)? I found that the samples in the original IR without phase reconstruction are all positive-valued, which can not be used for convolution directly.

It is not simply the sign, and it is not meant to model the time delay either. What you have in the 1st figure (irC) is the energy response. If we try to use reverse STFT to reconstruct a time-domain signal, we need both the magnitude and phase spectrum, and here we only have the energy response (all positive, of course) as the magnitude part. For mono-channel audios, the phase barely matters to human ears. I can suggest a reading https://pubs.aip.org/asa/jasa/article/138/2/708/917382/Overview-of-geometrical-room-acoustic-modeling (see Section III.A "Performing the conversion in reverse is not trivial because...") which explains why this reconstruction isn't straightforward.

ndwuhuangwei commented 1 year ago

Your observation is correct - a random noise is used to generate phase. This is far from being ideal indeed. The real underlying issue is we don't have a good way to reconstruct the phase, so we used a simple (lazy) random solution that you see here. My personal opinion is that phase is intractable in geometric acoustics due to the highly simplified boundary conditions (i.e., only describing surfaces using real-valued reflection coefficients). It's less of a problem for wave acoustic methods, but the phase will still be inaccurate since acoustic materials are difficult to get measured accurately in the first place.

Thanks for replying. I'm still confused about the concept of phase here. As I understand, the phase of a real-valued IR can be equivalent to the time delay in the x-axis. According to what I observed, the delay of the peaks in the simulated IR is already accurate enough without the phase reconstruction introduced by noise. Does the phase here refer to the sign of the amplitude (positive or negative)? I found that the samples in the original IR without phase reconstruction are all positive-valued, which can not be used for convolution directly.

It is not simply the sign, and it is not meant to model the time delay either. What you have in the 1st figure (irC) is the energy response. If we try to use reverse STFT to reconstruct a time-domain signal, we need both the magnitude and phase spectrum, and here we only have the energy response (all positive, of course) as the magnitude part. For mono-channel audios, the phase barely matters to human ears. I can suggest a reading https://pubs.aip.org/asa/jasa/article/138/2/708/917382/Overview-of-geometrical-room-acoustic-modeling (see Section III.A "Performing the conversion in reverse is not trivial because...") which explains why this reconstruction isn't straightforward.

Thank you so much for the reference material.

I would like to ask one more question about calculating energy response. In SoundPropagator::outputIRCache(), the energy of diffuse paths is normalized according to the number of diffuse rays. However, I can't find similar normalization for specular paths. It seems that gsound just sums the energy of all the specular rays. If so, the final energy response seems to be related to the number of specular rays sampled during ray tracing, which is unreasonable. Sorry for any misunderstanding due to my poor C++ knowledge.