LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.42k stars 424 forks source link

Cannot Get Multiple DoA Using Grid Search #268

Open johntcuellar opened 2 years ago

johntcuellar commented 2 years ago

Hi FakuFaku,

I cannot seem to get grid search to work for multiple signal signals. I used Issue #166 as a starting point, however, I am unsure of where to proceed from here. Please see my code below. Thank you. Also, please excuse the messy code. It is a work in progress.

In this example, I set two signals in the room. The input azimuths are 19.5 and 90 degrees and the colatitudes are 38 and 57 degrees. The output I get is as follows. As you can see, the colatitudes are way off:

Recovered azimuth: [89.88361159 20.28753509] degrees
  Error: [70.38361159  0.78753509] degrees
  Error: [ 0.11638841 69.71246491] degrees
  Recovered coaltitude: [121.61776733  37.83784526] degrees
  Error: [83.61776733  0.16215474] degrees
  Error: [64.61776733 19.16215474] degrees
import numpy as np
from scipy.signal import fftconvolve
import matplotlib.pyplot as plt
import time
import pyroomacoustics as pra
from pyroomacoustics.doa import circ_dist
from scipy.io import wavfile

######
# We define a meaningful distance measure on the circle

# Location of original source
azimuth_1 = 19.5 / 180.0 * np.pi  # With Respect to x axis
colatitude_1 = 38.0 / 180.0 *np.pi 
azimuth_2 = 90.0 / 180.0 * np.pi  # With Respect to x axis
colatitude_2 = 57.0 / 180.0 *np.pi 
distance = 3.0  # 3 meters

# Make Nearly Equidistant Points on a Spherical Grid for more than one Sound Source searching
#n_grid = pra.doa.grid.GridSphere(n_points=1000, spherical_points=None)
n_grid = 4000
#######################
# algorithms parameters
SNR = 0.0  # signal-to-noise ratio
c = 343.0  # speed of sound
fs = 16000  # sampling frequency - Comment out when importing .wav file
nfft = 256  # FFT size
fmin = 500 # Min Frequency Hz
fmax = 3500 # Max Frequency Hz
freq_bins = np.arange(5, 60)  # FFT bins to use for estimation
num_src = 2 # number of sources
dim = 3 # Dimension of room - '2' for 2D or '3' for 3D

# compute the noise variance
sigma2 = 10 ** (-SNR / 10) / (4.0 * np.pi * distance) ** 2
mic_to_mic_dist = 0.1 # distance from center of mic array to mic
z_offset_factor = 0.9 # Mic Z offset as a factor of z = room_dim[2]

# Define Room Materials

m = pra.make_materials(
    ceiling="hard_surface",
    floor="linoleum_on_concrete",
    east="brickwork",
    west="brickwork",
    north="brickwork",
    south="brickwork",)

# If importing a .wav file as a signal source, use below template. .wav file must
# be in the same folder location as the program
# ** NOTE: fs is extracted from the .wav file here

#fs, source_signal = wavfile.read('guitar_44k.wav')
#aroom.add_source([1.5, 1.7, 1.6], signal=audio_anechoic)

# Create an anechoic room
room_dim = np.r_[10.0, 10.0, 10.0] # Original Line
#aroom = pra.ShoeBox(room_dim, fs=fs, materials = m, max_order = 0, sigma2_awgn=sigma2, air_absorption = True, ray_tracing = True)
aroom = pra.ShoeBox(room_dim, fs=fs, materials = m, max_order = 0, sigma2_awgn=sigma2, air_absorption = False, ray_tracing = False)

# Activate Ray Tracing
#piparoom.set_ray_tracing(receiver_radius=0.5)

# add the source(s)
# Source 1
source_location = room_dim / 2 + distance * np.r_[np.cos(azimuth_1)*np.sin(colatitude_1), np.sin(azimuth_1)*np.sin(colatitude_1), np.cos(colatitude_1)]
source_signal = np.random.randn((nfft // 2 + 1) * nfft)
aroom.add_source(source_location, signal = source_signal)

# Source 2
source_location_2 = room_dim / 2 + distance * np.r_[np.cos(azimuth_2)*np.sin(colatitude_2), np.sin(azimuth_2)*np.sin(colatitude_2), np.cos(colatitude_2)]
source_signal_2 = np.random.randn((nfft // 2 + 1) * nfft)
aroom.add_source(source_location_2, signal = source_signal_2)

#Circular array of 12 mics
#R = np.c_[]

R = np.c_[
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(0 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(0 * np.pi / 6), room_dim[2]/2],  # Mic 1
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(1 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(1 * np.pi / 6), room_dim[2]/2],  # Mic 2
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(2 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(2 * np.pi / 6), room_dim[2]/2],  # Mic 3
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(3 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(3 * np.pi / 6), room_dim[2]/2],  # Mic 4
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(4 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(4 * np.pi / 6), room_dim[2]/2],  # Mic 5
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(5 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(5 * np.pi / 6), room_dim[2]/2],  # Mic 6
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(6 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(6 * np.pi / 6), room_dim[2]/2],  # Mic 7
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(7 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(7 * np.pi / 6), room_dim[2]/2],  # Mic 8
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(8 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(8 * np.pi / 6), room_dim[2]/2],  # Mic 9
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(9 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(9 * np.pi / 6), room_dim[2]/2],  # Mic 10
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(10 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(10 * np.pi / 6), room_dim[2]/2],  # Mic 11
    [room_dim[0]/2 + mic_to_mic_dist*np.cos(11 * np.pi / 6), room_dim[1]/2 + mic_to_mic_dist*np.sin(11 * np.pi / 6), room_dim[2]/2]]  # Mic 12
aroom.add_microphone_array(pra.MicrophoneArray(R, fs=aroom.fs))

# Cubic Mic Array
#R = np.c_[    
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(1 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(1 * np.pi / 4), z_offset_factor*room_dim[2]/2],  # Mic 1
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(2 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(2 * np.pi / 4), z_offset_factor*room_dim[2]/2],  # Mic 2
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(3 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(3 * np.pi / 4), z_offset_factor*room_dim[2]/2],  # Mic 3
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(4 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(4 * np.pi / 4), z_offset_factor*room_dim[2]/2],  # Mic 4
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(1 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(1 * np.pi / 4), room_dim[2]/2],  # Mic 5
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(2 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(2 * np.pi / 4), room_dim[2]/2],  # Mic 6
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(3 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(3 * np.pi / 4), room_dim[2]/2],  # Mic 7
#    [room_dim[0]/2 + mic_to_mic_dist*np.cos(4 * np.pi / 4), room_dim[1]/2 + mic_to_mic_dist*np.sin(4 * np.pi / 4), room_dim[2]/2]]  # Mic 8
#aroom.add_microphone_array(pra.MicrophoneArray(R, fs=aroom.fs))

# Use the following function to compute the rir using either 'ism' method, 'rt' method, or 'hybrid' method

#chrono = time.time()
aroom.compute_rir()
#print("Done in", time.time() - chrono, "seconds.")
#print("RT60:", aroom.measure_rt60()[0, 0, 0])
#aroom.plot_rir()
#plt.show() # Plot RiR

# run the simulation
aroom.simulate()

# Record the audio to a wav file at the microphone
#audio_reverb = aroom.mic_array.to_wav("aaa.wav", norm=True, bitdepth=np.int16)

################################
# Compute the STFT frames needed
X = np.array(
    [
        pra.transform.stft.analysis(signal, nfft, nfft // 2).T
        for signal in aroom.mic_array.signals
    ]
)

##############################################
# Now we can test all the algorithms available
algo_names = sorted(pra.doa.algorithms.keys())

#for algo_name in algo_names:
    # Construct the new DOA object
    # the max_four parameter is necessary for FRIDA only

doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c, max_four=4, n_grid = n_grid, dim = dim)

    # this call here perform localization on the frames in 
doa.locate_sources(X, num_src = num_src, freq_range = [fmin, fmax], freq_bins=freq_bins)

#plot_individual_spectrum() # Plot Steered Specrum for Each Frequency

#doa.polar_plt_dirac()
#plt.title(algo_name)
    # doa.azimuth_recon contains the reconstructed location of the source
#    print(algo_name)
print("  Recovered azimuth:", doa.azimuth_recon / np.pi * 180.0, "degrees")
print("  Error:", circ_dist(azimuth_1, doa.azimuth_recon) / np.pi * 180.0, "degrees")
print("  Error:", circ_dist(azimuth_2, doa.azimuth_recon) / np.pi * 180.0, "degrees")

print("  Recovered coaltitude:", doa.colatitude_recon / np.pi * 180.0, "degrees")
print("  Error:", circ_dist(colatitude_1, doa.colatitude_recon) / np.pi * 180.0, "degrees")
print("  Error:", circ_dist(colatitude_2, doa.colatitude_recon) / np.pi * 180.0, "degrees")

plt.show()
fakufaku commented 2 years ago

It looks like everything is working correctly. There are two things you should pay attention to.

  1. The order you recover the sources in is not guaranteed. This means that when you measure the distance you should consider all permutations of the sources and pick the one minimizing the total error. This can be done using the Hungarian algorithm, which is implement in scipy by the linear_sum_assignment function.
  2. It seems that you are using a planar array, i.e. all the microphones in a plane. In that case, sources placed symmetrically with respect to that plane cannot be told apart by the DOA algorithm. You need to resolve the ambiguity, for example by forcing all recovered source to be in the upper plane.

In your example, the recovered sources are

true az true co est az est co
19.5 38.0 20.28753509 37.83784526
90.0 57.0 89.88361159 121.61776733

I have aligned the groundtruth and estimated values to reduce the error. You can see that all values are pretty good, except for source 2 colatitude 121.6 != 56.0. However, 121.6 actually close to the symmetric point to 56.0 wrt xy-plane, i.e. 121.6 - 2 * (121.6 - 90) = 58.4 ~ 56.

johntcuellar commented 1 year ago

Hi FakuFaku,

Apologies for the delayed response. Thank you very much for the explanation. This was very helpful. Regarding your response, I have a couple more questions.

  1. Can you please explain what you mean by aligning to ground truth?

  2. Where did you get the reflection equation about the x - y plane?

Thank you.

Best,

John

On Tue, Jul 5, 2022 at 8:32 PM Robin Scheibler @.***> wrote:

It looks like everything is working correctly. There are two things you should pay attention to.

  1. The order you recover the sources in is not guaranteed. This means that when you measure the distance you should consider all permutations of the source and pick the one minimizing the total error. This can be done using the Hungarian algorithm https://en.wikipedia.org/wiki/Hungarian_algorithm, which is implement in scipy by the linear_sum_assignment https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html function.
  2. It seems that you are using a planar array, i.e. all the microphones in a plane. In that case, sources placed symmetrically with respect to that plane cannot be told apart by the DOA algorithm. You need to resolve the ambiguity, for example by forcing all recovered source to be in the upper plane.

In your example, the recovered sources are true az true co est az est co 19.5 38.0 20.28753509 37.83784526 90.0 57.0 89.88361159 121.61776733

I have aligned the groundtruth and estimated values to reduce the error. You can see that all values are pretty good, except for source 2 colatitude 121.6 != 56.0. However, 121.6 actually is the symmetric point to 56.0 wrt xy-plane, i.e. 121.6 - 2 * (121.6 - 90) = 58.4 ~ 56.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/268#issuecomment-1175740123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TPS6VGDVELLS645IJLVST45XANCNFSM52M2WIVQ . You are receiving this because you authored the thread.Message ID: @.***>

-- John Cuellar

fakufaku commented 1 year ago

Hello @johntcuellar ,

  1. Sure. The order of the source locations given by the algorithm is not guaranteed to match that of the groundtruth. A common practice is to measure the error of all pairs (estimated, reference) sources, and choose the ones that minimize the total error.
  2. This is just standard geometry. The collatiude goes from 0 for a source right at the north pole, to 180 degrees for one at the south pole. The colatitude of the source reflected by the x-y plane is given by co_refl = co - 2 * (co - 90).

Hope this clarifies. If not feel free to reply to this!

johntcuellar commented 1 year ago

Hi Robin,

This helps a lot!

Thank you!

Unrelated question: I recall reading in another thread on the PRA GitHub that one can only choose to import a STL file for the model of the room or define a shoebox room and have the ability to define wall material properties.

Is it possible to import an STL of objects within a room (chairs, tables, desks, etc.) and use the shoebox method to define the walls so that one can define a more realistic sound simulation model? If not, do you have any suggestions on any resources to do this?

Also, thank you so much for your contribution to PRA! It has been immediately helpful to my thesis. As I’ve read through the source code, I can tell that you must have put a lot of time and effort into it and it is much appreciated.

Best,

John Cuellar

On Tue, Oct 11, 2022 at 7:33 AM Robin Scheibler @.***> wrote:

Hello @johntcuellar https://github.com/johntcuellar ,

  1. Sure. The order of the source locations given by the algorithm is not guaranteed to match that of the groundtruth. A common practice is to measure to the error of all pairs (estimated, reference) sources, and choose the ones that minimize the total error.
  2. This is just standard geometry. The collatiude goes from 0 for a source right at the north pole, to 180 degrees for one at the south pole. The colatitude of the source reflected by the x-y plane is given by co_refl = co - 2 * (co - 90).

Hope this clarifies. If not feel free to reply to this!

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/268#issuecomment-1274794872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TLPUL75DX46HWQINSTWCV3EZANCNFSM52M2WIVQ . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar