Open fakufaku opened 2 years ago
@johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a GridSphere
object, which is a grid of cartesian unit vectors pseudo-uniformly distributed on the sphere. Compared to the two axis grid spherical coordinate system that you describe, it has the advantage of not giving preference to some directions. The spherical coordinates grid has many more points around the south and north pole than around the equator.
DOA objects will be automatically 3D if you give microphone coordinates as a (3, n_mics)
shape array. Then, the number of grid points can be specified by the n_grid
argument. See the DOA doc for details.
After localization, the DOA objects will have the target coordinates in azimuth_recon
and colatitude_recon
variables.
Hi FakuFakue,
Thanks for clarifying. I've come across the GridSphere before in the pyroom documentation.
So if I understand what you are saying correctly, I can use GridSphere to extract the spatial spectrum data so that I can plot a similar plot as below?:
[image: image.png] ref: https://www.mathworks.com/help/phased/ug/direction-of-arrival-estimation-with-beamscan-mvdr-and-music.html
Also, under the DOA object, what is Pssl?
Thank you.
Best,
John
On Tue, Jul 5, 2022 at 4:29 AM Robin Scheibler @.***> wrote:
@johntcuellar https://github.com/johntcuellar yes, the DOA objects can be used for 3D localization. Internally, a 3D DOA object will have a GridSphere https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.grid.html?highlight=gridsphere#pyroomacoustics.doa.grid.GridSphere object, which is a grid of cartesian unit vectors pseudo-uniformly distributed on the sphere. Compared to the two axis grid spherical coordinate system that you describe, it has the advantage of not giving preference to some directions. The spherical coordinates grid has many more points around the south and north pole than around the equator.
DOA objects will be automatically 3D if you give microphone coordinates as a (3, n_mics) shape array. Then, the number of grid points can be specified by the n_grid argument. See the DOA doc https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.doa.doa.html for details. After localization, the DOA objects will have the target coordinates in azimuth_recon and colatitude_recon variables.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1174951759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TIJHG75GHXOPUDVDC3VSQMBZANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333
If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.
I see
Thank you very much!
John On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler @.***> wrote:
The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333
If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1177490639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
Hi Robin,
A few questions:
When you say “resample”, what do you mean by that?
How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?
I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.
Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?
Thank you.
Best,
John Cuellar
On Thu, Jul 7, 2022 at 11:23 AM John Cuellar @.***> wrote:
I see
Thank you very much!
John
On Thu, Jul 7, 2022 at 4:55 AM Robin Scheibler @.***> wrote:
The Grid objects have actually some method to visualize the spatial spectrum. For example, https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L333
If you want to have a plot in azimuth/colatitude, you usually need to resample the spherical grid on a regular grid in the azimuth/colatitude domain.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1177490639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNQT7WLCWFDX2IZYFLVS3ASVANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
-- John Cuellar
- When you say “resample”, what do you mean by that?
Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.
- How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?
There are some scipy functions to do that I think, e.g. interp2d.
By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.
- I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.
Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing by Ivan Tashev.
- Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?
There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC
apply a different weighting before averaging.
Hi Robin,
Thanks for the clarifications. That definitely helps! If I have any other questions, I will be sure to ask.
Best,
John Cuellar
On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler @.***> wrote:
- When you say “resample”, what do you mean by that?
Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.
- How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?
There are some scipy functions to do that I think, e.g. interp2d https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html .
By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.
- I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.
Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836 by Ivan Tashev.
- Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?
There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1275468731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
Hi Robin,
I am having trouble re-gridding the spherical spatial spectrum in the azimuth/Colatitude domain. Can you please assist? I was able to successfully implement regrid, but not in the way I had hoped. The Variable 'Pssl_Grid' is the re-gridded pssl, but it is not a tuple of 3 180x90 arrays. Regrid is towards the bottom after the DoA simulation.
Thank you.
Best,
John
import numpy as np from scipy.signal import fftconvolve import matplotlib.pyplot as plt import time import pyroomacoustics as pra from pyroomacoustics.doa import circ_dist from scipy.io import wavfile from sympy import Plane, Point3D from scipy.signal import argrelextrema
azimuth_1 = 19.5 / 180.0 np.pi # With Respect to x axis colatitude_1 = 38.0 / 180.0 np.pi distance = 3.0 # 3 meters
#######################
SNR = 0.0 # signal-to-noise ratio c = 343.0 # speed of sound fs = 16000 # sampling frequency - Comment out when importing .wav file nfft = 256 # FFT size fmin = 500 # Min Frequency Hz fmax = 3500 # Max Frequency Hz freq_bins = np.arange(5, 60) # FFT bins to use for estimation num_src = 3 # number of sources max_order = 3; # Number of reflections (0 = anechoic room) dim = 3 # Dimension of room - '2' for 2D or '3' for 3D
Sound Source searching n_grid = 4000
sigma2 = 10 (-SNR / 10) / (4.0 np.pi distance) 2 mic_to_mic_dist = 0.1 # distance from center of mic array to mic z_offset_factor = 0.9 # Mic Z offset as a factor of z = room_dim[2]
m = pra.make_materials( ceiling="hard_surface", floor="linoleum_on_concrete", east="brickwork", west="brickwork", north="brickwork", south="brickwork",)
file must
roomdim = np.r[10.0, 10.0, 10.0] # Original Line aroom = pra.ShoeBox(room_dim, fs=fs, materials = m, max_order = max_order, sigma2_awgn=sigma2, air_absorption = True, ray_tracing = False)
wall by calling its name
source_location = roomdim / 2 + distance * np.r[np.cos(azimuth_1)np.sin(colatitude_1), np.sin(azimuth_1)np.sin(colatitude_1), np.cos(colatitude_1)] source_signal = np.random.randn((nfft // 2 + 1) * nfft) aroom.add_source(source_location, signal = source_signal)
R = np.c_[ [room_dim[0]/2 + mic_to_mic_distnp.cos(0 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(0 np.pi / 6), room_dim[2]/2], # Mic 1 [room_dim[0]/2 + mic_to_mic_distnp.cos(1 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(1 np.pi / 6), room_dim[2]/2], # Mic 2 [room_dim[0]/2 + mic_to_mic_distnp.cos(2 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(2 np.pi / 6), room_dim[2]/2], # Mic 3 [room_dim[0]/2 + mic_to_mic_distnp.cos(3 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(3 np.pi / 6), room_dim[2]/2], # Mic 4 [room_dim[0]/2 + mic_to_mic_distnp.cos(4 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(4 np.pi / 6), room_dim[2]/2], # Mic 5 [room_dim[0]/2 + mic_to_mic_distnp.cos(5 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(5 np.pi / 6), room_dim[2]/2], # Mic 6 [room_dim[0]/2 + mic_to_mic_distnp.cos(6 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(6 np.pi / 6), room_dim[2]/2], # Mic 7 [room_dim[0]/2 + mic_to_mic_distnp.cos(7 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(7 np.pi / 6), room_dim[2]/2], # Mic 8 [room_dim[0]/2 + mic_to_mic_distnp.cos(8 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(8 np.pi / 6), room_dim[2]/2], # Mic 9 [room_dim[0]/2 + mic_to_mic_distnp.cos(9 np.pi / 6), room_dim[1]/2 + mic_to_mic_distnp.sin(9 np.pi / 6), room_dim[2]/2], # Mic 10 [room_dim[0]/2 + mic_to_mic_distnp.cos(10 np.pi / 6), room_dim[1]/2
'rt' method, or 'hybrid' method
aroom.compute_rir()
aroom.simulate()
bitdepth=np.int16)
################################
X = np.array( [ pra.transform.stft.analysis(signal, nfft, nfft // 2).T for signal in aroom.mic_array.signals ] )
##############################################
doa = pra.doa.algorithms['NormMUSIC'](R, fs, nfft, num_src = num_src, c=c, max_four=4, dim = dim)
max_four=4, n_grid = n_grid, dim = dim)
# this call here perform localization on the frames in
doa.locate_sources(X, num_src = num_src, freq_range = [fmin, fmax], freq_bins=freq_bins)
# Regrid the SPherical Grid to a Rectangular one and plot it
Pssl_Grid = doa.grid.regrid()
fig = plt.figure(figsize=(14,10)) plt.plot() # Eventually plot Spatial Spectrum plt.xlabel("angle [°]") plt.title("NormMUSIC Spatial pseudo spectra in one plot", fontsize=15)
On Wed, Oct 12, 2022 at 10:09 PM John Cuellar @.***> wrote:
Hi Robin,
Thanks for the clarifications. That definitely helps! If I have any other questions, I will be sure to ask.
Best,
John Cuellar
On Tue, Oct 11, 2022 at 6:30 PM Robin Scheibler @.***> wrote:
- When you say “resample”, what do you mean by that?
Many visualization method requires the input to be sampled on a uniform grid in the azimuth/elevation domain (grid1). However, the pyroomacoustics GridSphere uses a grid that is (kindof) uniform on a sphere in 3D (grid2). Resample means interpolate the value of the spatial spectrum on the points of grid1 from their values on grid2.
- How do I sample on a regular grid and by “regular grid”, do you mean a rectangular one?
There are some scipy functions to do that I think, e.g. interp2d https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html .
By regular I mean uniformly spaced, etc. Usually this is rectangular indeed.
- I see that the spatial spectrum data is stored in different frequency bins (in Pssl, if I recall correctly). Why is this necessary? I assume it has to do with the subspace DOA methods, but no papers I have found go into detail about why they split the data into frequency bins.
Many DOA estimation methods work in the STFT domain because for narrow-band signal (i.e., a single frequency bin of the STFT), the delay operation can be represented by multiplication with a complex exponential (thanks to the convolution property of the Fourier transform). These explanations are usually omitted in papers. You might find it in textbooks, such as the Sound Capture and Processing https://www.amazon.co.jp/-/en/Ivan-Jelev-Tashev/dp/0470319836 by Ivan Tashev.
- Since Pssl stores the spatial spectrum data in different frequency bins, how would I conglomerate it so that I can look for peaks across the entire range of frequencies that I define? I would assume it is not as simple as summing the columns across all frequency bins (or is it)?
There are several methods, the simplest being indeed to average all the frequencies together. This is what is done in pyroomacoustics. Some methods, for example in NormMUSIC apply a different weighting before averaging.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1275468731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TNNMFB6PWVSU35WKSDWCYIDJANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
-- John Cuellar
Hi John,
The implementation of the regrid
method is here. The output is three linear arrays corresponding to azimuth value, colatitude value, and Pssl value. The arrays are linearized versions of 2N x N
arrays where N = int(np.sqrt(n_points / 2))
and n_points
is the number of points in the original grid (to keep approximately the same number of points).
If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function.
Hi Robin,
Thank you. Based on previous emails, I was under the impression that there was a way to resample or regrid in the colatitude/azimuth domain directly using regrid().
I’ve spent a few hours trying to do it in my own to no avail. Since I was having trouble, I was hoping you might be able to help (admittedly, I have limited experience in Python, yet I find throwing myself in this project to be a good teaching experience. As a result, I find myself asking a lot of questions).
Best,
John Cuellar
On Thu, Nov 17, 2022 at 7:25 AM Robin Scheibler @.***> wrote:
Hi John,
The implementation of the regrid method is here https://github.com/LCAV/pyroomacoustics/blob/master/pyroomacoustics/doa/grid.py#L306. The output is three linear arrays corresponding to azimuth value, colatitude value, and Pssl value. The arrays are linearized versions of 2N x N arrays where N = int(np.sqrt(n_points / 2)) and n_points is the number of points in the original grid (to keep approximately the same number of points).
If the function does not exactly do what you want, I would suggest to just scoop up the code from the original method and tweak it to your convenience in a new function.
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1318800562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TJFZ7LAJSSRHRLRIFLWIZFANANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
-- John Cuellar
The resample
method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?
Hi Robin,
Absolutely. I want to create a plot of the spatial spectrum data like the one below.
[image: image.png]
The plot is really just a validation that I have extracted the data I want to extract. I then want to feed this data into a neural network for sound source localization as part of a project I am working on.
Going back to your previous email, you say that the regrid() output arrays are linear arrays corresponding to the colatitude, azimuth, and Pssl value. Does each column correspond to each sound source? I was curious as to why there are specifically three columns in each array.
Thank you.
Best,
John
On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler @.***> wrote:
The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1319492418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
Hi, Robin,
Disregard that last paragraph, please.
I was able to get the plot I wanted, but I still have some questions:
[image: image.png]
[image: image.png]
I understand that the three output arrays from regrid() correspond to azimuth, colatitude, and the pssl values, but why would the first two arrays not just be linear arrays of the allowable ranges of the azithm (360 deg) and colatitude (90 deg)?
Since the azimuth and colatitude arrays are 180 x 90, what azimuth angle would, for example, the value in the 100th row and the 10th column correspond to?
If the original spherical plot had a azimuth and colatitude range of 360 degrees and 90 degrees, respectively, (a hemi-sphere), why does the regrid effectively half that to a quarter of a sphere (180 x 90)?
Is there a way to correlate a point on the original sphere to a particular azimuth/colatitude value? What I mean is, the original Pssl was a linear array. How would I correlate any value in that array to azimuth and colatitude angle without regridding to a rectangular mesh?
Thank you.
Best,
John
On Thu, Nov 17, 2022 at 9:00 PM John Cuellar @.***> wrote:
Hi Robin,
Absolutely. I want to create a plot of the spatial spectrum data like the one below.
[image: image.png]
The plot is really just a validation that I have extracted the data I want to extract. I then want to feed this data into a neural network for sound source localization as part of a project I am working on.
Going back to your previous email, you say that the regrid() output arrays are linear arrays corresponding to the colatitude, azimuth, and Pssl value. Does each column correspond to each sound source? I was curious as to why there are specifically three columns in each array.
Thank you.
Best,
John
On Thu, Nov 17, 2022 at 7:17 PM Robin Scheibler @.***> wrote:
The resample method above does resample onto a uniform grid in the azimuth/colatitude domain. Can you please clarify what you want to do ?
— Reply to this email directly, view it on GitHub https://github.com/LCAV/pyroomacoustics/issues/270#issuecomment-1319492418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5W7TMZZH6OIUL5QON4FI3WI3YL5ANCNFSM52UQIECA . You are receiving this because you were mentioned.Message ID: @.***>
Hi FakuFaku,
Is there a way to extract the 3D pseudo-spectrum data from a DOA object in the case of a 3D DOA simulation as mentioned above? Or, stated another way, is there a way to find the pseudo-spectrum along two axis (in order to find the peaks for both azimuth and colatitude), similar to what was done in your example with NormMusic below?:
https://github.com/LCAV/pyroomacoustics/blob/master/notebooks/norm_music_demo.ipynb
Thanks.
Originally posted by @johntcuellar in https://github.com/LCAV/pyroomacoustics/issues/166#issuecomment-1174163142