LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.44k stars 429 forks source link

Cannot plot in 3-d #59

Closed akhilvasvani closed 5 years ago

akhilvasvani commented 5 years ago

Hi,

I have tried plotting the microphones and source in 3-d, but every time I do that I get error: "IndexError: tuple index out of range". I looked under the room.py source file and I know that that capability exists, but maybe I am doing something wrong? Any help would greatly be appreciated.

Here's my code:

Build 3-D Representation of the room

corners = np.array([[0,0],[15,0], [15,10], [0,10]]).T # This is our block room = pra.Room.from_corners(corners) room.extrude(10.) # For 3-d

Add Source

fs, signal = wavfile.read("/Downloads/cmu_us_awb_arctic/wav/arctic_b0528.wav") my_source = pra.SoundSource([1, 1, 0], signal=signal) room.add_source(my_source)

room = pra.Room.from_corners(corners, fs=fs)

room.add_source([1.,1.,1.], signal=signal)

Add 12-microphone array

R = np.c_[ # [x, y, z] [2.85, 0.49, 0], # mic 1 [4.82, 0.49, 0], # mic 2 [5.80, 0.49, 0], # mic 3 [2.85, 2.65, 0], # mic 4 [4.82, 2.65, 0], # mic 5 [5.80, 2.65, 0] # mic 6 ]

room.add_microphone_array(pra.MicrophoneArray(R, room.fs))

fig, ax = room.plot() ax.set_xlim([-1, 15]) ax.set_ylim([-1, 10]); ax.set_zlim([0, 15]); # for 3-d

fakufaku commented 5 years ago

Hi @akhilvasvani ,

Luckily this is fairly simple to fix. Currently the API for adding a sound source in a room takes the location of the source directly (rather than a SoundSource object). You should modify your code as follows.

#Add Source
fs, signal = wavfile.read("/Downloads/cmu_us_awb_arctic/wav/arctic_b0528.wav")
room.add_source([1,1,0], signal=signal)

Sorry that the API might be a little bit inconsistent in this case. Ideally both should work.

Unrelated to the error you got, I see that you are creating a shoebox room. The preferred API for this is using the Shoebox class. In this case it will use a much more efficient algorithm to compute the RIR. Rather than the corners, the argument is just a triplet of the room dimensions, [15, 10, 10] in your case.

akhilvasvani commented 5 years ago

Thank you for your help. That definitely worked!

I actually have a couple more questions related to using TDOA.

On a conceptual note: when I'm using any of the TDOA methods—SRP, MUSIC, FRIDA, WAVES, CSSM (for sound source localization), why must I label where the source is? Isn't the point of the algorithms to tell me where the sound source is originating from? If I have given my source a position, what does the vector in the azimuth direction really informing me?

On another note, I wish to perform TDOA in 3-d. Using 12 microphones in a rectangular order, I have generated the RIR and performed several TDOA methods. However, they results are not accurate when I change the position of the source. I even receive a warning:

Ill-conditioned matrix detected. Result is not guaranteed to be accurate. Reciprocal condition number9.774071e-25 c_ri_half = linalg.solve(mtx_loop, rhs, check_finite=False)[:sz_coef] /usr/local/lib/python3.7/site-packages/pyroomacoustics/doa/tools_fri_doa_plane.py:855: LinAlgWarning: scipy.linalg.solve

I have noticed from the example code that they typically employ the microphones in a circular 2-d array. Would this have any effect on the result? Any help to solve my problem would be greatly appreciated. (NOTE, the code I use for TDOA is in 2-d).

CODE HERE (continued from above code)

room.simulate()

from pyroomacoustics.doa import circ_dist

azimuth = 61. / 180. * np.pi # 60 degrees c = 343. # speed of sound fs = 16000 # sampling frequency nfft = 256 # FFT size freq_range = [2500, 4500]

X = np.array([pra.stft(signal, nfft, nfft // 2, transform=np.fft.rfft).T for signal in room.mic_array.signals])

##############################################

Now we can test all the algorithms available

algo_names = ['SRP', 'MUSIC', 'FRIDA', 'TOPS'] spatial_resp = dict()

for algo_name in algo_names:

Construct the new DOA object

# the max_four parameter is necessary for FRIDA only
doa = pra.doa.algorithms[algo_name](R, fs, nfft, c=c)

# this call here perform localization on the frames in X
doa.locate_sources(X, freq_range=freq_range)

# store spatial response
spatial_resp[algo_name] = doa.grid.values

# normalize   
min_val = spatial_resp[algo_name].min()
max_val = spatial_resp[algo_name].max()
spatial_resp[algo_name] = (spatial_resp[algo_name] - min_val) / (max_val - min_val)

doa.polar_plt_dirac()
plt.title(algo_name)

# doa.azimuth_recon contains the reconstructed location of the source
print(algo_name)
print('  Recovered azimuth:', doa.azimuth_recon / np.pi * 180., 'degrees')
print('  Error:', circ_dist(azimuth, doa.azimuth_recon) / np.pi * 180., 'degrees')

plt.show()
print((doa.azimuth_recon)/ np.pi * 180.) 
fakufaku commented 5 years ago

@akhilvasvani I am not sure I understand the first question. You do not need to give the source location to the algorithm. In the example script, I use the true location only to evaluate the output from the algorithm. Is this what you are referring to ?

The results not being accurate might depend on a lot of things. Doing accurate DOA estimation is tricky business. It depends on the content of the source signal, the geometry of the microphone array, and the frequency bands you chose (among other things). I see you use a frequency range between 2500 and 4500 Hz. You might need to reduce this depending on the spacing of your microphones. The spacing should be somewhere around half or a quarter of the wavelength of the frequency bands. The warning about numerical accuracy comes from the FRIDA algorithm usually and does not affect the results in general.

Finally, just a side note. In the future it would be great if you can close the resolved issue and open a new issue for a new problem. This allows to keep things organized and lets other people search through the issues more easily. Thanks! 😄

akhilvasvani commented 5 years ago

Ok, now I see why you gave the true location—you want to compare it with what the algorithm outputs and see the difference in error. That makes sense now.

So here's my issue. I am trying to find the exact (x,y, z coordinate) of a sound source given a 12 microphones. I am using a rectangular geometric array with set positions for the microphones, lowered the frequency band range, and my results are still not accurate. Are there any suggestions I should consider?

Also, on a side note: does the doa object output the time shift delay between a pair of microphones?

After this, I'll close the issue. Thanks again

fakufaku commented 5 years ago

If you'd like me to take a look at your code, feel free to open a new issue. Make sure to copy/paste the whole script. I'll try to run it and see if I can spot the problem.

The DOA object do not give the time shift between microphones because this is not what most method use to find out the DOA. However, once the DOA has been detected, it is trivial to compute the time delay between the microphones.