LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.35k stars 419 forks source link

Issue with generate_rirs() #178

Closed suhasbn closed 2 years ago

suhasbn commented 3 years ago

Hi, I'm having an issue with the generate_rirs() function. It throws the following error:

generate_rirs(room) Traceback (most recent call last):

File "", line 1, in room.generate_rirs()

File "", line 6, in generate_rirs self.compute_rir()

File "C:\Users\abc\Documents\wham_room.py", line 44, in compute_rir h.append(source.get_rir(mic, self.visibility[s][m], self.fs, self.t0)[:self.max_rir_len])

File "C:\Users\abc\Anaconda3\lib\site-packages\pyroomacoustics\soundsource.py", line 254, in get_rir fast_rir_builder(ir, time, alpha, visibility.astype(np.int32), Fs, fdl)

File "pyroomacoustics\build_rir.pyx", line 53, in pyroomacoustics.build_rir.fast_rir_builder

AssertionError


Any help would be apreciated!

fakufaku commented 3 years ago

Hi @suhasbn , could you please provide the code that you are trying to debug here ?

suhasbn commented 3 years ago

Hi @suhasbn , could you please provide the code that you are trying to debug here ?

Hi Robin, Here's the snippet for your reference.

reverb_param_df = pd.read_csv('trial.csv',engine='python')
scaling_npz = pd.read_csv('trial.csv', engine='python')

utt_ids = scaling_npz['mixture_ID']

for i_utt, output_name in enumerate(utt_ids):
    utt_row = reverb_param_df[reverb_param_df['mixture_ID'] == output_name]
    room = WhamRoom([utt_row['room_x'].iloc[0], utt_row['room_y'].iloc[0], utt_row['room_z'].iloc[0]],
                            [[utt_row['micL_x'].iloc[0], utt_row['micL_y'].iloc[0], utt_row['mic_z'].iloc[0]],
                            [utt_row['micR_x'].iloc[0], utt_row['micR_y'].iloc[0], utt_row['mic_z'].iloc[0]]],
                            [utt_row['s1_x'].iloc[0], utt_row['s1_y'].iloc[0], utt_row['s1_z'].iloc[0]],
                            [utt_row['s2_x'].iloc[0], utt_row['s2_y'].iloc[0], utt_row['s2_z'].iloc[0]],
                            utt_row['T60'].iloc[0])

    room.generate_rirs()
fakufaku commented 3 years ago

Hi @suhasbn , the original errors seems to be triggered by line 53 here. This line checks that the minimum time of arrival at the microphone fits in the array when building the room impulse response. If you still need help, please provide the code for WhamRoom, as well as any additional piece of code necessary to reproduce the error.

Emrys365 commented 3 years ago

Hi, I also encountered the same error when trying to generate the WHAMR! data with pyroomacoustics 0.4.1. You can find the code for WhamRoom in wham_room.py downloaded from here: https://storage.googleapis.com/whisper-public/whamr_scripts.tar.gz


FYI, after I change to use pyroomacoustics 0.2.0, the error does not appear again.

gwichern commented 3 years ago

@fakufaku I'm one of the co-authors of WHAMR, we've been distributing our scripts that depend on pyroomacoustics for about a year, but it appears that something v0.4.1 broke our simulation code. I've extracted a minimal example from our code that reproduces the error:

import numpy as np
import pyroomacoustics as pra
from pyroomacoustics.parameters import constants

room_dim = [8.590185968518231, 6.461586772520692, 3.198773872200343]
mics = [[4.355616339273031, 3.2370661465340023, 0.9741057742912328],
        [4.499660767093616, 3.278913521766396, 0.9741057742912328]]
s1_pos = [4.039940301349218, 2.4757959306413455, 1.0066250383911688]
s2_pos = [5.385401336245542, 2.444991127822015, 1.617540956995779]
T60 = 0.44078723695817584
fs = 16000
t0 = 0.0
sigma2_awgn = None

max_rir_len = np.ceil(T60*fs).astype(int)
volume = room_dim[0] * room_dim[1] * room_dim[2]
surface_area = 2*(room_dim[0] * room_dim[1] + room_dim[0] * room_dim[2] + room_dim[1] * room_dim[2])
absorption = 24 * volume * np.log(10.0) / (constants.get('c') * surface_area * T60)

# minimum max order to guarantee complete filter of length T60
max_order = np.ceil(T60 * constants.get('c') / min(room_dim)).astype(int)

room = pra.room.ShoeBox(room_dim, fs=fs, t0=t0, absorption=absorption,
                        max_order=max_order, sigma2_awgn=sigma2_awgn,
                        sources=None, mics=None)
room.add_source(s1_pos)
room.add_source(s2_pos)
room.add_microphone_array(pra.MicrophoneArray(np.array(mics).T, fs))

rir = []
room.visibility = None

room.image_source_model()

for m, mic in enumerate(room.mic_array.R.T):
    h = []
    for s, source in enumerate(room.sources):
        h.append(source.get_rir(mic, room.visibility[s][m], room.fs, room.t0)[:max_rir_len])
    rir.append(h)

It appears that before v0.4.1, room.t0 was set to a nonzero value when constructed (here), but this is no longer happening.

fakufaku commented 3 years ago

@gwichern Thanks for the report! I only just realized that pyroomacoustics was used to generate WHAMR! I understand better this issue now!

A lot of things have changed in 0.4.0, sorry for the trouble! In particular:

Here is the code example fixed for 0.4.1

import numpy as np
import pyroomacoustics as pra
from pyroomacoustics.parameters import constants

room_dim = [8.590185968518231, 6.461586772520692, 3.198773872200343]
mics = [[4.355616339273031, 3.2370661465340023, 0.9741057742912328],
        [4.499660767093616, 3.278913521766396, 0.9741057742912328]]
s1_pos = [4.039940301349218, 2.4757959306413455, 1.0066250383911688]
s2_pos = [5.385401336245542, 2.444991127822015, 1.617540956995779]
T60 = 0.44078723695817584
fs = 16000
t0 = 0.0
sigma2_awgn = None

max_rir_len = np.ceil(T60*fs).astype(int)
volume = room_dim[0] * room_dim[1] * room_dim[2]
surface_area = 2*(room_dim[0] * room_dim[1] + room_dim[0] * room_dim[2] + room_dim[1] * room_dim[2])
absorption = 24 * volume * np.log(10.0) / (constants.get('c') * surface_area * T60)

# minimum max order to guarantee complete filter of length T60
max_order = np.ceil(T60 * constants.get('c') / min(room_dim)).astype(int)

room = pra.room.ShoeBox(room_dim, fs=fs, absorption=absorption, max_order=max_order)
room.add_source(s1_pos)
room.add_source(s2_pos)
room.add_microphone_array(pra.MicrophoneArray(np.array(mics).T, fs))

room.compute_rir()

rir = room.rir

For the dataset, I suppose that there are two solutions: 1) Add a requirements.txt with the exact version number to generate the dataset 2) Change the code following the example above (I would still add the requirements.txt file in that case)

I know this is not ideal and apologize for that.

gwichern commented 3 years ago

@fakufaku Thanks for the reply. I noticed different lengths for the RIRs returned by your code in v0.4.1 (~19,000 samples and different for each mic-source pair) and my code in v0.3.1 (7053 samples for each mic-source pair). That's quite a large difference. I guess any RIRs generated by recent versions (>0.4) of pyroomacoustics will be different than those from earlier versions?

I agree that we should use a requirements.txt. To keep the dataset consistent with what has already been published we will stick with v0.3.1 for now.

fakufaku commented 3 years ago

@gwichern I am guilty of not having checked the output consistency between the two versions due to some large changes in the simulator (addition of ray tracing). Although, I have tried to keep consistency in that the output of the shoebox generator should be the same, for most parameter choices. Obviously, the influence of values of t0, and sigma2_awgn, which have disappeared cannot be replicated. But in your setup, this should not be important. The length (as in number of samples) of the RIR may be different, but the actual values should be the same (the algorithm is the same), provided ray tracing is not used and absorption and max_order are set to the same values. For the dataset, I think it is probably indeed better to fix the requirements file to the version used at the time of the generation of the dataset.

fakufaku commented 2 years ago

Closing this due to lack of activity.