BrownsugarZeer / Multi_SSL

Combine sound source separation with SRP-PHAT to achieve multi-source localization.
MIT License
52 stars 12 forks source link

azi ele #6

Closed Stirve587 closed 3 months ago

Stirve587 commented 5 months ago

Checked other resources

Issue with current documentation

Can you tell me how you calculated azi and ele?thank you very much

Idea or request for content

Can you tell me how you calculated azi and ele?thank you very much

Further Information

Can you tell me how you calculated azi and ele?thank you very much
BrownsugarZeer commented 5 months ago

Sorry for the late reply. It's hard to elaborate the explanation due to the insufficient content for your issue.

As mentioned in the README, I used DUET to separate each source and then passed it to the SRP-PHAT algorithm for multi-source tracking.

Stirve587 commented 5 months ago

Arrange according to your microphone device. In this case, if azi is 0 degrees, is it in the Y-axis direction

Stirve587 commented 5 months ago

In the process of testing, I found that the Angle of my output was very confusing, so I did not understand the problem

BrownsugarZeer commented 5 months ago

Arrange according to your microphone device. In this case, if azi is 0 degrees, is it in the Y-axis direction

Yes, you're correct. Maybe the angle of the microphone does not match the settings? Here is the microphone module spec

image

I don't have this microphone array on hand for now, I will find one as soon as possible and retest it.

Stirve587 commented 5 months ago

H_X__WE8@LXN@U~ XM20{{R This is the location of the microphone array designed in your code, and I have restored it, so where should azi be 0?

Stirve587 commented 5 months ago

27db12e4d6b4477bdf1c63516cddcc00 Is there a problem with my design?

Stirve587 commented 5 months ago

}`G@0) 74%BL{H_)_`{G`SR This is the audio azimuth collected in the opposite direction of the X-axis KXHWQ$YA}IVR )GBD~FO}~J This is the audio location collected in the Y-axis direction, I am a little at a loss, I don't know what the problem is, could you please give me some guidance

Stirve587 commented 5 months ago

In addition, in your code, if the audio collected under quiet conditions can also detect the location, would it be better to set a threshold to remove some subtle sounds?

BrownsugarZeer commented 5 months ago

The hardware configuration you provided is very helpful. It made me notice that signal input may not be processed. If the data between channels is mixxed with each other, there will be no delay(SRP-PHAT cannot be calculated). Below is the audio file received through the respeaker. This microphone array has six channels, and only the middle four are needed for calculation which are channel 1 to channel 4.

image

and the numpy array we will receive is

# c1 means the value of channel 1, and so on
wav = [c1, c2, c3, c4, c1, c2, c3, c4, c1, c2, c3, c4, ...]

here is the streaming data as we process from microphone array

https://github.com/BrownsugarZeer/Multi_SSL/blob/6406031decaa4ddec7b407da3dfa704d748e06c4/src/mic/microphone_stream.py#L131-L135

In short, you need to first ensure that the shape must be (channels, chunk) before passing it to the function. In our case, it should be (4, 1600), for 4 channels and each channel has 1600 samples.

BTY:

would it be better to set a threshold to remove some subtle sounds?

Absolutely, but that means making a loss. The losses also damage attenuation and delay, affecting the results of your calculations.

Stirve587 commented 5 months ago

I use my mobile phone to play music as a sound source, why the effect is not very good when positioning, is it the problem of my microphone equipment?

Stirve587 commented 5 months ago

Sorry, the chunk I just tested was 1024, I changed it to 1600 and found that the general direction is correct, but there are still some differences in accuracy

BrownsugarZeer commented 5 months ago

It depends on your experimental setup, whether the microphone is in the middle or corner of the room (effect of reverb), whether the phone is stationary or moving, and how close the music is to the microphone, etc. There are many factors that can cause the calculation results to be different from those presented in the README.

Stirve587 commented 5 months ago

Excuse me, I still have a few questions for you OI809Q Q4} }YE$MGSLY}PD My sound source is in the top left corner of the microphone array, which is the first quadrant. Why is x negative

BrownsugarZeer commented 5 months ago

I think there are some things we need to clarify first.

The first thing is your hardware configuration the same as that of the respeaker v2?(the fugure below) The same refers to the directions of 0 degrees, 90 degrees, 180 degrees, 270 degrees and the microphones number.

image

The second thing is which direction the top left corner refers to in the hardware configuration you provided (because I still don't know where 0 degrees is). But according to the definition, the top left corner should be the second quadrant, which will fall between 90 and 180 degrees. Then the picture you provided is as expected.

In order to stay on the same page, please take a look whether this picture below is consistent with your understanding.

image

Stirve587 commented 5 months ago

I'm really sorry that I should have said the top right corner, the first quadrant, yesterday, causing you misunderstanding. I'm sorry again, I just did a set of experiments and found that the direction of the sound source position xyz in the first quadrant is -+-, the direction shown in the second quadrant is ---, the direction shown in the third quadrant is ++-, and the direction shown in the fourth quadrant is ---. What is the reason for the unsatisfactory results in all four quadrants

Stirve587 commented 5 months ago

Based on the angles provided, I did not obtain the expected results. The detected source position angles in the first quadrant are azi= 101.7, ele = 10.1; in the second quadrant, azi = 198, ele = 42.4; in the third quadrant, azi = 2.0, ele = 54.2; and in the fourth quadrant, azi = 176.4, ele = 57.5.I would like to ask whether the microphone equipment is wrong with you or for other reasons, and whether I need to change a microphone equipment like yours

Stirve587 commented 5 months ago

Is this visual coordinate display correct? I put the sound source into the first quadrant and measured a set of data, azi=245.8, ele=19.7, according to the Angle range you gave, it does belong to the first quadrant, but when I visualized the operation, the sound source position is in the third quadrant, xyz is all -

BrownsugarZeer commented 5 months ago

I'm glad you are taking my project seriously, so please don't apologize, and I welcome any discussion and suggestions. Let me verify your questions one by one:

Is this visual coordinate display correct?

Here is a simple test for that, creating a csv file for 50 samples, all of which are identical.

# samples.csv
s1_azi,s1_ele,s2_azi,s2_ele,s3_azi,s3_ele
0,25,45,50,245.8,19.7
0,25,45,50,245.8,19.7
...

and run the following command to generate the visualization

python srp_visualizer.py -s=3 --wav=samples.csv

I found that the X and Y axis labels should be opposite!!

image

I would like to ask whether the microphone equipment is wrong with you or for other reasons

The biggest difference is that the microphone array I use is a mature product. I don't have to worry about data flow, as mentioned in the previous conversation:

c1 means the value of channel 1, and so on wav = [c1, c2, c3, c4, c1, c2, c3, c4, c1, c2, c3, c4, ...]

But I'm not sure what your data flow look like, so I did an experiment

# A sound source recorded from respeaker v2
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=a0e19_3_1b6ede00.wav
Find 1 available sources.
azi:    359.7, ele:   22.7

# Assume that the channel data is scrambled
# wav = [c1, c3, c2, c4, c2, c3, c1, c4, c4, c2, c1, c3, ...]
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=a0e19_3_1b6ede00_shuffled.wav
Find 1 available sources.
azi:    306.0, ele:   86.0

Why does this happen? Because when we process, it will be reshaped into (channels, chunk). In this way, we can calculate the delay and attenuation between channels. If the channels are chaotic, we cannot determine the relative delay, and the result will naturally be wrong.

Stirve587 commented 5 months ago

Thank you very much for your patient reply. I found that my microphone equipment was not very good for the collected audio, so I bought the same equipment as yours online and planned to do the experiment again. Is there a problem with your visual xy axis? I am very glad to help you find the problem. I will continue the experiment and communicate with you after the equipment arrives. Thank you for your reply

BrownsugarZeer commented 5 months ago

Hi, I got the microphone array this morning and after testing it, I found that the coordinates of the microphones are in the wrong order, but this has nothing to do with algorithms or hardware. In fact, it still depends on the channel order defined by the user and where the 0-degree angle is. The following is a comparison of the original code and the changed code.

Before:

https://github.com/BrownsugarZeer/Multi_SSL/blob/6406031decaa4ddec7b407da3dfa704d748e06c4/src/utils/ssl.py#L24-L27

After:

mics[0, :] = torch.Tensor([+0.02285, +0.02285, +0.005])
mics[1, :] = torch.Tensor([-0.02285, +0.02285, +0.005])
mics[2, :] = torch.Tensor([-0.02285, -0.02285, +0.005])
mics[3, :] = torch.Tensor([+0.02285, -0.02285, +0.005])

Before:

https://github.com/BrownsugarZeer/Multi_SSL/blob/6406031decaa4ddec7b407da3dfa704d748e06c4/srp_visualizer.py#L16-L21

After:

MICS = np.array(
    [
        [+0.02285, +0.02285, +0.005],
        [-0.02285, +0.02285, +0.005],
        [-0.02285, -0.02285, +0.005],
        [+0.02285, -0.02285, +0.005],
    ]
)

The output may be what you expect. BTY, I re-checked srp_visualizer.py and found that the X and Y axis labels were not wrong :(

Stirve587 commented 5 months ago

Hello, in your system. You use the SRP-PHAT algorithm to locate the location of the sound source, and from my learning, SRP-PHAT itself does not directly measure the distance from the sound source to the array. It is often necessary to estimate the specific distance of the sound source in combination with other methods, such as TDOA or signal strength attenuation models. What method did you use to determine the distance from the sound source to the array?

BrownsugarZeer commented 5 months ago

Yes, you're right, we need another way to estimate the distance of the sound source(I didn’t study this part further😔). SRP-PHAT only looks for the maximum energy and treats it as the direction of the sound source, which has nothing to do with distance.

Also note that if the microphone array is placed in a corner, it may be affected by sound reflections and cause wrong direction.

Stirve587 commented 4 months ago

Hello author, the equipment I bought is the same as the one you used in the experiment. Why is the sound waveform I collected very small and much worse than yours? Could you please give me a copy of the code you used to collect the audio data? Thank you very much

BrownsugarZeer commented 4 months ago

Hi, we can find in srp_phat_online.py that use pyaudio package for sound streaming. We didn't write a code to record the sound(If did, it will also be part of open source). At most, only recorded offline sound through Audacity.

Why the results are poor, maybe we can start thinking from some directions.

  1. How high is the microphone from the ground?
  2. How many sound sources are there?
  3. If there are multiple sound sources, what is the angle between them?
  4. How far is the sound source from the microphone?
  5. Room size

Maybe we can start an experiment with a single sound source and confirm within 50(or 100) times. How many hits are made within an angle of 5 degrees (or 10 degrees) and calculate the accuracy. Any idea~?

letnnn commented 4 months ago

Hello, I have a piece of audio with a sample rate of 44100, do I need to make changes to the code if I apply your code. I see that you are using a sample rate of 16000 which is inconsistent with mine. Hope you answer.

BrownsugarZeer commented 4 months ago

@letnnn Hi, I didn't try a waveform with a sample rate of 44100, but changing parameter values ​​should be able to work around(As long as the microphone array has a 44100 sample rate that supports).

https://github.com/BrownsugarZeer/Multi_SSL/blob/6406031decaa4ddec7b407da3dfa704d748e06c4/srp_phat_online.py#L19-L21

letnnn commented 4 months ago

Thank you for your explanation. If I change the relevant values using other microphone array boards, can I still perform localization?

BrownsugarZeer commented 4 months ago

Thank you for your explanation. If I change the relevant values using other microphone array boards, can I still perform localization?

SRP-PHAT and DUET themselves are not affected by any microphone array, as long as they are multi-channel outputs. But my project is based on the ReSpeaker Mic Array v2.0, some code might not work as expected(such as srp_visualizer.py).

If you have any further question, feel free to open an issue~

letnnn commented 4 months ago

Hello, I would like to ask, why do I get results greater than 0 when experimenting with an elevation angle less than 0?

chimamaxianghahahahahaha commented 4 months ago

Hello author, why is the range of pitch angle 0-90 degrees? If I want to measure the pitch angle range of -90~90, doas[:, 2]= doas[:, 2].abs() should I not use the absolute value here

chimamaxianghahahahahaha commented 3 months ago

Hello author,If we remove the .abs() from doas[:, 2]= doas[:, 2].abs(), can we expand the range of the pitch angle to [-90, 90°]

BrownsugarZeer commented 3 months ago

Hello author,If we remove the .abs() from doas[:, 2]= doas[:, 2].abs(), can we expand the range of the pitch angle to [-90, 90°]

@chimamaxianghahahahahaha Sure, SRP-PHAT has nothing to do with a specific use case, it's just that I need to make this restriction in my use case. I will also close the issue as the current conversation is not relevant with this issue.