LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.38k stars 419 forks source link

extract the 3D pseudo-spectrum data from a DOA object #270

Open fakufaku opened 2 years ago

fakufaku commented 2 years ago

Hi FakuFaku,

Is there a way to extract the 3D pseudo-spectrum data from a DOA object in the case of a 3D DOA simulation as mentioned above? Or, stated another way, is there a way to find the pseudo-spectrum along two axis (in order to find the peaks for both azimuth and colatitude), similar to what was done in your example with NormMusic below?:

https://github.com/LCAV/pyroomacoustics/blob/master/notebooks/norm_music_demo.ipynb

Thanks.

Originally posted by @johntcuellar in https://github.com/LCAV/pyroomacoustics/issues/166#issuecomment-1174163142

fakufaku commented 2 years ago

@johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a GridSphere object, which is a grid of cartesian unit vectors pseudo-uniformly distributed on the sphere. Compared to the two axis grid spherical coordinate system that you describe, it has the advantage of not giving preference to some directions. The spherical coordinates grid has many more points around the south and north pole than around the equator.

DOA objects will be automatically 3D if you give microphone coordinates as a (3, n_mics) shape array. Then, the number of grid points can be specified by the n_grid argument. See the DOA doc for details. After localization, the DOA objects will have the target coordinates in azimuth_recon and colatitude_recon variables.

johntcuellar commented 2 years ago

Hi FakuFakue,

Thanks for clarifying. I've come across the GridSphere before in the pyroom documentation.

So if I understand what you are saying correctly, I can use GridSphere to extract the spatial spectrum data so that I can plot a similar plot as below?:

[image: image.png] ref: https://www.mathworks.com/help/phased/ug/direction-of-arrival-estimation-with-beamscan-mvdr-and-music.html

Also, under the DOA object, what is Pssl?

Thank you.

Best,

John

On Tue, Jul 5, 2022 at 4:29 AM Robin Scheibler @.***> wrote:

@johntcuellar https://github.com/johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a GridSphere https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.grid.html?highlight=gridsphere#pyroomacoustics.doa.grid.GridSphere object, which is a grid of cartesian unit vectors pseudo-uniformly distributed on the sphere. Compared to the two axis grid spherical coordinate system that you describe, it has the advantage of not giving preference to some directions. The spherical coordinates grid has many more points around the south and north pole than around the equator.

DOA objects will be automatically 3D if you give microphone coordinates as a (3, n_mics) shape array. Then, the number of grid points can be specified by the n_grid argument. See the DOA doc https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.doa.html for details. After localization, the DOA objects will have the target coordinates in azimuth_recon and colatitude_recon variables.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1174951759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TIJHG75GHXOPUDVDC3VSQMBZANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

fakufaku commented 2 years ago

The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333

If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.

johntcuellar commented 2 years ago

I see

Thank you very much!

John On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler @.***> wrote:

The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333

If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1177490639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

johntcuellar commented 1 year ago

Hi Robin,

A few questions:

  1. When you say “resample”, what do you mean by that?

  2. How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?

  3. I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.

  4. Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?

Thank you.

Best,

John Cuellar

On Thu, Jul 7, 2022 at 11:23 AM John Cuellar @.***> wrote:

I see

Thank you very much!

John

On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler @.***> wrote:

The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333

If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1177490639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

-- John Cuellar

fakufaku commented 1 year ago
  1. When you say “resample”, what do you mean by that?

Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.

  1. How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?

There are some scipy functions to do that I think, e.g. interp2d.

By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.

  1. I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.

Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing by Ivan Tashev.

  1. Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?

There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.

johntcuellar commented 1 year ago

Hi Robin,

Thanks for the clarifications. That definitely helps! If I have any other questions, I will be sure to ask.

Best,

John Cuellar

On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler @.***> wrote:

  1. When you say “resample”, what do you mean by that?

Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.

  1. How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?

There are some scipy functions to do that I think, e.g. interp2d https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html .

By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.

  1. I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.

Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836 by Ivan Tashev.

  1. Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?

There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1275468731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

johntcuellar commented 1 year ago

Hi Robin,

I am having trouble re-gridding the spherical spatial spectrum in the azimuth/Colatitude domain. Can you please assist? I was able to successfully implement regrid, but not in the way I had hoped. The Variable 'Pssl_Grid' is the re-gridded pssl, but it is not a tuple of 3 180x90 arrays. Regrid is towards the bottom after the DoA simulation.

Thank you.

Best,

John

from future import print_function

import numpy as np from scipy.signal import fftconvolve import matplotlib.pyplot as plt import time import pyroomacoustics as pra from pyroomacoustics.doa import circ_dist from scipy.io import wavfile from sympy import Plane, Point3D from scipy.signal import argrelextrema

We define a meaningful distance measure on the circle

Location of original source

azimuth_1 = 19.5 / 180.0 np.pi # With Respect to x axis colatitude_1 = 38.0 / 180.0 np.pi distance = 3.0 # 3 meters

#######################

algorithms parameters

SNR = 0.0 # signal-to-noise ratio c = 343.0 # speed of sound fs = 16000 # sampling frequency - Comment out when importing .wav file nfft = 256 # FFT size fmin = 500 # Min Frequency Hz fmax = 3500 # Max Frequency Hz freq_bins = np.arange(5, 60) # FFT bins to use for estimation num_src = 3 # number of sources max_order = 3; # Number of reflections (0 = anechoic room) dim = 3 # Dimension of room - '2' for 2D or '3' for 3D

Make Nearly Equidistant Points on a Spherical Grid for more than one

Sound Source searching n_grid = 4000

compute the noise variance

sigma2 = 10 (-SNR / 10) / (4.0 np.pi distance) 2 mic_to_mic_dist = 0.1 # distance from center of mic array to mic z_offset_factor = 0.9 # Mic Z offset as a factor of z = room_dim[2]

Define Room Materials

m = pra.make_materials( ceiling="hard_surface", floor="linoleum_on_concrete", east="brickwork", west="brickwork", north="brickwork", south="brickwork",)

If importing a .wav file as a signal source, use below template. .wav

file must

be in the same folder location as the program

** NOTE: fs is extracted from the .wav file here

fs, source_signal = wavfile.read('guitar_44k.wav')

aroom.add_source([1.5, 1.7, 1.6], signal=audio_anechoic)

Create an anechoic room

roomdim = np.r[10.0, 10.0, 10.0] # Original Line aroom = pra.ShoeBox(room_dim, fs=fs, materials = m, max_order = max_order, sigma2_awgn=sigma2, air_absorption = True, ray_tracing = False)

north_wall = aroom.get_wall_by_name('north') # Giving an instance of a

wall by calling its name

aroom.set_ray_tracing(receiver_radius=0.5)# Activate Ray Tracing

add the source(s)

Source 1

source_location = roomdim / 2 + distance * np.r[np.cos(azimuth_1)np.sin(colatitude_1), np.sin(azimuth_1)np.sin(colatitude_1), np.cos(colatitude_1)] source_signal = np.random.randn((nfft // 2 + 1) * nfft) aroom.add_source(source_location, signal = source_signal)

Circular array of 12 mics

R = np.c_[]

R = np.c_[ [room_dim[0]/2 + mic_to_mic_distnp.cos(0 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(0 np.pi / 6), room_dim[2]/2], # Mic 1 [room_dim[0]/2 + mic_to_mic_distnp.cos(1 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(1 np.pi / 6), room_dim[2]/2], # Mic 2 [room_dim[0]/2 + mic_to_mic_distnp.cos(2 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(2 np.pi / 6), room_dim[2]/2], # Mic 3 [room_dim[0]/2 + mic_to_mic_distnp.cos(3 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(3 np.pi / 6), room_dim[2]/2], # Mic 4 [room_dim[0]/2 + mic_to_mic_distnp.cos(4 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(4 np.pi / 6), room_dim[2]/2], # Mic 5 [room_dim[0]/2 + mic_to_mic_distnp.cos(5 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(5 np.pi / 6), room_dim[2]/2], # Mic 6 [room_dim[0]/2 + mic_to_mic_distnp.cos(6 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(6 np.pi / 6), room_dim[2]/2], # Mic 7 [room_dim[0]/2 + mic_to_mic_distnp.cos(7 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(7 np.pi / 6), room_dim[2]/2], # Mic 8 [room_dim[0]/2 + mic_to_mic_distnp.cos(8 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(8 np.pi / 6), room_dim[2]/2], # Mic 9 [room_dim[0]/2 + mic_to_mic_distnp.cos(9 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(9 np.pi / 6), room_dim[2]/2], # Mic 10 [room_dim[0]/2 + mic_to_mic_distnp.cos(10 np.pi / 6), room_dim[1]/2

Use the following function to compute the rir using either 'ism' method,

'rt' method, or 'hybrid' method

chrono = time.time()

aroom.compute_rir()

print("Done in", time.time() - chrono, "seconds.")

print("RT60:", aroom.measure_rt60()[0, 0, 0])

aroom.plot_rir()

plt.show() # Plot RiR

run the simulation

aroom.simulate()

Record the audio to a wav file at the microphone

audio_reverb = aroom.mic_array.to_wav("aaa.wav", norm=True,

bitdepth=np.int16)

################################

Compute the STFT frames needed

X = np.array( [ pra.transform.stft.analysis(signal, nfft, nfft // 2).T for signal in aroom.mic_array.signals ] )

##############################################

MUSIC or NormMUSIC Algorithm

doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c, max_four=4, dim = dim)

doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c,

max_four=4, n_grid = n_grid, dim = dim)

# this call here perform localization on the frames in

doa.locate_sources(X, num_src = num_src, freq_range = [fmin, fmax], freq_bins=freq_bins)

# Regrid the SPherical Grid to a Rectangular one and plot it

Pssl_Grid = doa.grid.regrid()

Pssl_Grid = doa.Pssl.regrid(n_grid)

fig = plt.figure(figsize=(14,10)) plt.plot() # Eventually plot Spatial Spectrum plt.xlabel("angle [°]") plt.title("NormMUSIC Spatial pseudo spectra in one plot", fontsize=15)

On Wed, Oct 12, 2022 at 10:09 PM John Cuellar @.***> wrote:

Hi Robin,

Thanks for the clarifications. That definitely helps! If I have any other questions, I will be sure to ask.

Best,

John Cuellar

On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler @.***> wrote:

  1. When you say “resample”, what do you mean by that?

Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.

  1. How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?

There are some scipy functions to do that I think, e.g. interp2d https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html .

By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.

  1. I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.

Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836 by Ivan Tashev.

  1. Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?

There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1275468731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

-- John Cuellar

fakufaku commented 1 year ago

Hi John,

The implementation of the regrid method is here. The output is three linear arrays corresponding to azimuth value, colatitude value, and Pssl value. The arrays are linearized versions of 2N x N arrays where N = int(np.sqrt(n_points / 2)) and n_points is the number of points in the original grid (to keep approximately the same number of points).

If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function.

johntcuellar commented 1 year ago

Hi Robin,

Thank you. Based on previous emails, I was under the impression that there was a way to resample or regrid in the colatitude/azimuth domain directly using regrid().

I’ve spent a few hours trying to do it in my own to no avail. Since I was having trouble, I was hoping you might be able to help (admittedly, I have limited experience in Python, yet I find throwing myself in this project to be a good teaching experience. As a result, I find myself asking a lot of questions).

Best,

John Cuellar

On Thu, Nov 17, 2022 at 7:25 AM Robin Scheibler @.***> wrote:

Hi John,

The implementation of the regrid method is here https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L306. The output is three linear arrays corresponding to azimuth value, colatitude value, and Pssl value. The arrays are linearized versions of 2N x N arrays where N = int(np.sqrt(n_points / 2)) and n_points is the number of points in the original grid (to keep approximately the same number of points).

If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function.

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1318800562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TJFZ7LAJSSRHRLRIFLWIZFANANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

-- John Cuellar

fakufaku commented 1 year ago

The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?

johntcuellar commented 1 year ago

Hi Robin,

Absolutely. I want to create a plot of the spatial spectrum data like the one below.

[image: image.png]

The plot is really just a validation that I have extracted the data I want to extract. I then want to feed this data into a neural network for sound source localization as part of a project I am working on.

Going back to your previous email, you say that the regrid() output arrays are linear arrays corresponding to the colatitude, azimuth, and Pssl value. Does each column correspond to each sound source? I was curious as to why there are specifically three columns in each array.

Thank you.

Best,

John

On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler @.***> wrote:

The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1319492418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>

johntcuellar commented 1 year ago

Hi, Robin,

Disregard that last paragraph, please.

I was able to get the plot I wanted, but I still have some questions:

[image: image.png]

  1. When I specify three sources, two of the recovered angles almost overlap (see below). Why would that happen?

[image: image.png]

  1. I understand that the three output arrays from regrid() correspond to azimuth, colatitude, and the pssl values, but why would the first two arrays not just be linear arrays of the allowable ranges of the azithm (360 deg) and colatitude (90 deg)?

  2. Since the azimuth and colatitude arrays are 180 x 90, what azimuth angle would, for example, the value in the 100th row and the 10th column correspond to?

  3. If the original spherical plot had a azimuth and colatitude range of 360 degrees and 90 degrees, respectively, (a hemi-sphere), why does the regrid effectively half that to a quarter of a sphere (180 x 90)?

  4. Is there a way to correlate a point on the original sphere to a particular azimuth/colatitude value? What I mean is, the original Pssl was a linear array. How would I correlate any value in that array to azimuth and colatitude angle without regridding to a rectangular mesh?

Thank you.

Best,

John

On Thu, Nov 17, 2022 at 9:00 PM John Cuellar @.***> wrote:

Hi Robin,

Absolutely. I want to create a plot of the spatial spectrum data like the one below.

[image: image.png]

The plot is really just a validation that I have extracted the data I want to extract. I then want to feed this data into a neural network for sound source localization as part of a project I am working on.

Going back to your previous email, you say that the regrid() output arrays are linear arrays corresponding to the colatitude, azimuth, and Pssl value. Does each column correspond to each sound source? I was curious as to why there are specifically three columns in each array.

Thank you.

Best,

John

On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler @.***> wrote:

The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?

— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1319492418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>