LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.33k stars 417 forks source link

How to simulation energy decrease? #343

Closed coreeey closed 3 weeks ago

coreeey commented 1 month ago

In theory, as the speech signal travels further into the far-field, we expect to observe a significant decrease in energy, leading to a noticeable attenuation in the spectrum. This attenuation typically manifests as a shift from dense resonance peaks to gradually sparse ones. However, in my simulation experiments, I noticed a curious anomaly: regardless of the distance simulated, the spectrum only exhibited aliasing effects without any observable attenuation. So why does this phenomenon occur? (the code i used is room_L_shape_3d_rt.py in example) The last line is the original signal, the others are reverberated signals pic1

fakufaku commented 1 month ago

Hello @coreeey , this looks pretty good to me. I suppose the absence of attenuation may be due to a global rescaling of the signal before saving to file. Please check that. Also, I don't see any aliasing occuring in these spectrogram (aliasing would be some copy of high frequencies into low frequencies). The further the source is from the microphone, the longer the reverberation time will be. This causes the longer tail that is observed in your simulated signals.

coreeey commented 1 month ago

Thanks @fakufaku for your very prompt reply, I did indeed perform a global rescaling of the signal before saving it to file, which might explain the absence of attenuation. I will attempt another approach to address this issue. Additionally, I realize now that I misunderstood the 'aliasing occurring in these spectrograms.' I initially thought it referred to aliasing of spectra over time.

coreeey commented 1 month ago

Thanks @fakufaku, I managed to address this issue by multiplying the normalized signal by 1000, resulting in a spectrum that closely resembles the actual microphone audio.

s = convolve(audio_anechoic, rir)
s = np.squeeze(s, axis=0)
s_norm = s / np.max(np.abs(s))
# s_norm_ = np.int16(s_norm * 32767)
s_norm_ = np.int16(s_norm * 1000)
wavfile.write("tmp_out.wav", 16000, s_norm_)

However, I have a minor inquiry to make: 'snorm = np.int16(s_norm 32767)' and 'snorm = np.int16(s_norm 1000)'. What is the relationship between the 1000 and 32767? Will the volume increase when multiplied by a larger value? and if i want to adjust the simulated microphone's sound pressure to reach 65 dB. Can I achieve this by modifying this constant? 2

DanTremonti commented 1 month ago

Hi @coreeey, can you clarify whether the signals used to create the spectrogram plots in the initial comment were

  1. loaded from disk, or
  2. were the direct room processed output without writing to disk
coreeey commented 3 weeks ago

Hi @coreeey, can you clarify whether the signals used to create the spectrogram plots in the initial comment were

  1. loaded from disk, or
  2. were the direct room processed output without writing to disk

Hi @DanTremonti, i processed output with a max based normalization, and plot the spectrogram through audacity.

DanTremonti commented 3 weeks ago

@coreeey Thanks for the clarification :)

fakufaku commented 3 weeks ago

@coreeey The normalization of audio before saving to a format like wav is one of the finer and confusing point of audio processing. The problem is that wav (when saving has integer valued samples) has a finite precision. Many files are saved in 16 bits and you want to maximize the use of the 16 bits to represent the amplitude of the sound. If the maximum amplitude is too small, only a few bits will be used to encode all the values. Often, we will rescale the maximum to a value close to 2^15 which is the maximum value allowed by 16 bits to maximize the precision used. The rescaling in practicer only changes the volume of the audio. The simingly innocuous operation will lose the relative difference of amplitudes of different files, as you have noticed in your original issue.

The trick if you want to conserve the relative differences is to rescale all files by the same value in such a way that they do not go outside the range of 16 bits. This is usually done by rescaling the maximum absolute amplitude across all signals that we want to compare so that it maps to 2^15.

Here is an example for two signals.

scale = max(abs(signal1).max(), abs(signal2).max())
signal1 = (signal1 * 32768 / scale).astype(np.int16)
signal2 = (signal2 * 32768 / scale).astype(np.int16)
coreeey commented 3 weeks ago

@coreeey The normalization of audio before saving to a format like wav is one of the finer and confusing point of audio processing. The problem is that wav (when saving has integer valued samples) has a finite precision. Many files are saved in 16 bits and you want to maximize the use of the 16 bits to represent the amplitude of the sound. If the maximum amplitude is too small, only a few bits will be used to encode all the values. Often, we will rescale the maximum to a value close to 2^15 which is the maximum value allowed by 16 bits to maximize the precision used. The rescaling in practicer only changes the volume of the audio. The simingly innocuous operation will lose the relative difference of amplitudes of different files, as you have noticed in your original issue.

The trick if you want to conserve the relative differences is to rescale all files by the same value in such a way that they do not go outside the range of 16 bits. This is usually done by rescaling the maximum absolute amplitude across all signals that we want to compare so that it maps to 2^15.

Here is an example for two signals.

scale = max(abs(signal1).max(), abs(signal2).max())
signal1 = (signal1 * 32768 / scale).astype(np.int16)
signal2 = (signal2 * 32768 / scale).astype(np.int16)

@fakufaku thank you for the detailed and kind reply, and for developing such a great project.