LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.47k stars 433 forks source link

Help Using Pyroomacoustics for DOA with Matrix Voice #145

Closed ShawnHymel closed 3 years ago

ShawnHymel commented 4 years ago

I'm wondering if someone might be able to help me. I'm trying to use the SRP algorithm in Pyroomacoustics to determine the direction of arrival of an incoming sound in real time (or as close to real time as possible) on a Matrix Voice microphone array connected to a Raspberry Pi.

I've taken a recording of all 8 channels from the Matrix Voice and set up a simulated room in Pyroomacoustics using only channel 0 as a point source (at about the same angle/distance away from the mic array as where I actually made the sound). As you can see, Pyroomacoustics seems to do pretty well finding the DOA for that point source (simulation):

doa-sim

However, if I feed the original source with all 8 channels through the SRP algorithm (or any other DOA algorithm), Pyroomacoustics cannot seem to find the location of the source (real audio from all 8 channels):

doa-real

I created the mic array in code using the locations provided for the Matrix Voice, which can be found here.

I can get the ODAS software to run and perform DOA on the same setup, so I know that DOA is possible. However, digging through their code is really difficult, which is why I'm hoping to use Pyroomacoustics as a starting place instead. I believe ODAS uses a modified version of SRP-PHAT, based on their paper.

I've created a test Jupyter Notebook here where you can see my simulation vs. reality tests. The only lead I have right now is that the spectrograms look different between the simulated and real acoustic data, which makes me wonder if there's some kind of filtering or pre-processing that needs to occur in real-time prior to sending FFT outputs to the DOA algorithm.

Any ideas on where I might need to make additions/corrections would be most appreciated!

fakufaku commented 4 years ago

I've taken a look a the notebook and can't pin-point exactly where things are going wrong. But here are a few things to try.

If you want to share the sound file, I could take a look directly too. Also, if you find out what went wrong, I'd be thankful if you share it here so that I can improve the documentation or correct a bug if needed.

ShawnHymel commented 4 years ago

Thank you for the response! I'll start looking into some of these. I was under the impression that colatitude started at 0 deg from the south pole, so I'll play with that (even then, the azimuth still seems off).

I used sounddevice to record the audio and then store the resulting arrays in a .npy file. From what I can tell, it's stored as floating point. The sound file is stored as 8ch_hello_az0_el30_1m_48k.npy in the repo: https://github.com/ShawnHymel/pyroomacoustics-matrix-voice-test.

I will definitely share anything I might find to help others take this library from simulation to real hardware. It's a great package--thank you for providing it!

fakufaku commented 4 years ago

Hi, I tried running the notebook and I can't say I could solve exactly the problem, I have a few suggestions.

  1. You are simulating a shoebox room. It is more efficient to do that using the ShoeBox class. I have also added some reverberation to better match the recording conditions. Also, for now I think you can just set the noise to 0 (don't specify sigma2_awgn).
    # Create an anechoic room (absorption = 1)
    absorption = 0.2
    max_order = 17
    room_dim = np.array([10., 10., 10.])
    room = pra.ShoeBox(room_dim, absorption=absorption, fs=sample_rate, max_order=max_order)
  2. For the DOA candidate locations, rather than specifying azimuth/colatitude, it is preferrable to specigy the grid size via the n_grid argument to the DOA object. If you specify azimuth/colatitude, you end up with lots of points at the pole, and few on the equator. The grid will try to cover the sphere uniformly. The drawback is that the current implementation of grid is not so good for 2.5D (i.e. a flat array). So for now I would suggest to just do full 3D and ignore the symmetry. This means that given a source at 30 deg. elevation, it could also be detected at -30 deg. You can manually disambiguate.
    # Locate sources using DOA submodule
    doa = pra.doa.SRP(mic_array, 
                  sample_rate, 
                  nfft, 
                  c=sound_speed, 
                  num_src=1, 
                  dim=3, 
                  n_grid=1000)
  3. Finally, after using the grid, you'll see that the "sky" image is much better and that the image from simulation is not so different from the recorded data. However, the recorded data seems rotated somehow. Is it possible that the microphone on the X axis is not microphone zero ? Or that the ordering of the channels is reversed ?
ShawnHymel commented 4 years ago

Thank you for the suggestions! My client wants me to look at something else for right now, which is why I haven't had a chance to dig into this issue more.

As for the microphone placement, I'm following what Matrix Voice as given as coordinates here: https://matrix-io.github.io/matrix-documentation/matrix-voice/resources/microphone/

I was able to use arecord on the Rasbperry Pi to record me scratching my finger over each mic, one at a time. I did verify that the channels do line up with the mic number as reported in that link.

fakufaku commented 4 years ago

No worries! Let me know if you have more questions at a later time.

fakufaku commented 3 years ago

Closing this due to lack of activity.