Closed Stirve587 closed 3 months ago
Sorry for the late reply. It's hard to elaborate the explanation due to the insufficient content for your issue.
As mentioned in the README, I used DUET to separate each source and then passed it to the SRP-PHAT algorithm for multi-source tracking.
Arrange according to your microphone device. In this case, if azi is 0 degrees, is it in the Y-axis direction
In the process of testing, I found that the Angle of my output was very confusing, so I did not understand the problem
Arrange according to your microphone device. In this case, if azi is 0 degrees, is it in the Y-axis direction
Yes, you're correct. Maybe the angle of the microphone does not match the settings? Here is the microphone module spec
I don't have this microphone array on hand for now, I will find one as soon as possible and retest it.
This is the location of the microphone array designed in your code, and I have restored it, so where should azi be 0?
Is there a problem with my design?
This is the audio azimuth collected in the opposite direction of the X-axis This is the audio location collected in the Y-axis direction, I am a little at a loss, I don't know what the problem is, could you please give me some guidance
In addition, in your code, if the audio collected under quiet conditions can also detect the location, would it be better to set a threshold to remove some subtle sounds?
The hardware configuration you provided is very helpful. It made me notice that signal input may not be processed. If the data between channels is mixxed with each other, there will be no delay(SRP-PHAT cannot be calculated). Below is the audio file received through the respeaker. This microphone array has six channels, and only the middle four are needed for calculation which are channel 1 to channel 4.
and the numpy array we will receive is
# c1 means the value of channel 1, and so on
wav = [c1, c2, c3, c4, c1, c2, c3, c4, c1, c2, c3, c4, ...]
here is the streaming data as we process from microphone array
In short, you need to first ensure that the shape must be (channels, chunk)
before passing it to the function. In our case, it should be (4, 1600), for 4 channels and each channel has 1600 samples.
BTY:
would it be better to set a threshold to remove some subtle sounds?
Absolutely, but that means making a loss. The losses also damage attenuation and delay, affecting the results of your calculations.
I use my mobile phone to play music as a sound source, why the effect is not very good when positioning, is it the problem of my microphone equipment?
Sorry, the chunk I just tested was 1024, I changed it to 1600 and found that the general direction is correct, but there are still some differences in accuracy
It depends on your experimental setup, whether the microphone is in the middle or corner of the room (effect of reverb), whether the phone is stationary or moving, and how close the music is to the microphone, etc. There are many factors that can cause the calculation results to be different from those presented in the README.
Excuse me, I still have a few questions for you My sound source is in the top left corner of the microphone array, which is the first quadrant. Why is x negative
I think there are some things we need to clarify first.
The first thing is your hardware configuration the same as that of the respeaker v2?(the fugure below) The same refers to the directions of 0 degrees, 90 degrees, 180 degrees, 270 degrees and the microphones number.
The second thing is which direction the top left corner refers to in the hardware configuration you provided (because I still don't know where 0 degrees is). But according to the definition, the top left corner should be the second quadrant, which will fall between 90 and 180 degrees. Then the picture you provided is as expected.
In order to stay on the same page, please take a look whether this picture below is consistent with your understanding.
I'm really sorry that I should have said the top right corner, the first quadrant, yesterday, causing you misunderstanding. I'm sorry again, I just did a set of experiments and found that the direction of the sound source position xyz in the first quadrant is -+-, the direction shown in the second quadrant is ---, the direction shown in the third quadrant is ++-, and the direction shown in the fourth quadrant is ---. What is the reason for the unsatisfactory results in all four quadrants
Based on the angles provided, I did not obtain the expected results. The detected source position angles in the first quadrant are azi= 101.7, ele = 10.1; in the second quadrant, azi = 198, ele = 42.4; in the third quadrant, azi = 2.0, ele = 54.2; and in the fourth quadrant, azi = 176.4, ele = 57.5.I would like to ask whether the microphone equipment is wrong with you or for other reasons, and whether I need to change a microphone equipment like yours
Is this visual coordinate display correct? I put the sound source into the first quadrant and measured a set of data, azi=245.8, ele=19.7, according to the Angle range you gave, it does belong to the first quadrant, but when I visualized the operation, the sound source position is in the third quadrant, xyz is all -
I'm glad you are taking my project seriously, so please don't apologize, and I welcome any discussion and suggestions. Let me verify your questions one by one:
Is this visual coordinate display correct?
Here is a simple test for that, creating a csv file for 50 samples, all of which are identical.
# samples.csv
s1_azi,s1_ele,s2_azi,s2_ele,s3_azi,s3_ele
0,25,45,50,245.8,19.7
0,25,45,50,245.8,19.7
...
and run the following command to generate the visualization
python srp_visualizer.py -s=3 --wav=samples.csv
I found that the X and Y axis labels should be opposite!!
I would like to ask whether the microphone equipment is wrong with you or for other reasons
The biggest difference is that the microphone array I use is a mature product. I don't have to worry about data flow, as mentioned in the previous conversation:
c1 means the value of channel 1, and so on wav = [c1, c2, c3, c4, c1, c2, c3, c4, c1, c2, c3, c4, ...]
But I'm not sure what your data flow look like, so I did an experiment
# A sound source recorded from respeaker v2
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=a0e19_3_1b6ede00.wav
Find 1 available sources.
azi: 359.7, ele: 22.7
# Assume that the channel data is scrambled
# wav = [c1, c3, c2, c4, c2, c3, c1, c4, c4, c2, c1, c3, ...]
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=a0e19_3_1b6ede00_shuffled.wav
Find 1 available sources.
azi: 306.0, ele: 86.0
Why does this happen? Because when we process, it will be reshaped into (channels, chunk)
. In this way, we can calculate the delay and attenuation between channels. If the channels are chaotic, we cannot determine the relative delay, and the result will naturally be wrong.
Thank you very much for your patient reply. I found that my microphone equipment was not very good for the collected audio, so I bought the same equipment as yours online and planned to do the experiment again. Is there a problem with your visual xy axis? I am very glad to help you find the problem. I will continue the experiment and communicate with you after the equipment arrives. Thank you for your reply
Hi, I got the microphone array this morning and after testing it, I found that the coordinates of the microphones are in the wrong order, but this has nothing to do with algorithms or hardware. In fact, it still depends on the channel order defined by the user and where the 0-degree angle is. The following is a comparison of the original code and the changed code.
Before:
After:
mics[0, :] = torch.Tensor([+0.02285, +0.02285, +0.005])
mics[1, :] = torch.Tensor([-0.02285, +0.02285, +0.005])
mics[2, :] = torch.Tensor([-0.02285, -0.02285, +0.005])
mics[3, :] = torch.Tensor([+0.02285, -0.02285, +0.005])
Before:
After:
MICS = np.array(
[
[+0.02285, +0.02285, +0.005],
[-0.02285, +0.02285, +0.005],
[-0.02285, -0.02285, +0.005],
[+0.02285, -0.02285, +0.005],
]
)
The output may be what you expect. BTY, I re-checked srp_visualizer.py
and found that the X and Y axis labels were not wrong :(
Hello, in your system. You use the SRP-PHAT algorithm to locate the location of the sound source, and from my learning, SRP-PHAT itself does not directly measure the distance from the sound source to the array. It is often necessary to estimate the specific distance of the sound source in combination with other methods, such as TDOA or signal strength attenuation models. What method did you use to determine the distance from the sound source to the array?
Yes, you're right, we need another way to estimate the distance of the sound source(I didn’t study this part further😔). SRP-PHAT only looks for the maximum energy and treats it as the direction of the sound source, which has nothing to do with distance.
Also note that if the microphone array is placed in a corner, it may be affected by sound reflections and cause wrong direction.
Hello author, the equipment I bought is the same as the one you used in the experiment. Why is the sound waveform I collected very small and much worse than yours? Could you please give me a copy of the code you used to collect the audio data? Thank you very much
Hi, we can find in srp_phat_online.py
that use pyaudio package for sound streaming. We didn't write a code to record the sound(If did, it will also be part of open source). At most, only recorded offline sound through Audacity.
Why the results are poor, maybe we can start thinking from some directions.
Maybe we can start an experiment with a single sound source and confirm within 50(or 100) times. How many hits are made within an angle of 5 degrees (or 10 degrees) and calculate the accuracy. Any idea~?
Hello, I have a piece of audio with a sample rate of 44100, do I need to make changes to the code if I apply your code. I see that you are using a sample rate of 16000 which is inconsistent with mine. Hope you answer.
@letnnn Hi, I didn't try a waveform with a sample rate of 44100, but changing parameter values should be able to work around(As long as the microphone array has a 44100 sample rate that supports).
Thank you for your explanation. If I change the relevant values using other microphone array boards, can I still perform localization?
Thank you for your explanation. If I change the relevant values using other microphone array boards, can I still perform localization?
SRP-PHAT and DUET themselves are not affected by any microphone array, as long as they are multi-channel outputs. But my project is based on the ReSpeaker Mic Array v2.0, some code might not work as expected(such as srp_visualizer.py
).
If you have any further question, feel free to open an issue~
Hello, I would like to ask, why do I get results greater than 0 when experimenting with an elevation angle less than 0?
Hello author, why is the range of pitch angle 0-90 degrees? If I want to measure the pitch angle range of -90~90, doas[:, 2]= doas[:, 2].abs() should I not use the absolute value here
Hello author,If we remove the .abs() from doas[:, 2]= doas[:, 2].abs(), can we expand the range of the pitch angle to [-90, 90°]
Hello author,If we remove the .abs() from doas[:, 2]= doas[:, 2].abs(), can we expand the range of the pitch angle to [-90, 90°]
@chimamaxianghahahahahaha Sure, SRP-PHAT has nothing to do with a specific use case, it's just that I need to make this restriction in my use case. I will also close the issue as the current conversation is not relevant with this issue.
Checked other resources
Issue with current documentation
Can you tell me how you calculated azi and ele?thank you very much
Idea or request for content
Can you tell me how you calculated azi and ele?thank you very much
Further Information