BrownsugarZeer / Multi_SSL

Combine sound source separation with SRP-PHAT to achieve multi-source localization.
MIT License
52 stars 12 forks source link

[DOCS] The results of my audio localization are not satisfactory. #7

Open letnnn opened 3 months ago

letnnn commented 3 months ago

Checked other resources

Issue with current documentation

The results of my audio localization are not satisfactory.

Idea or request for content

The positioning was performed using two audio recordings, but the positioning result of one of them deviates significantly from the actual location. I'm not sure what the reason is and hope you can help me figure it out. 1723602464945

Further Information

自己定位置时遇到的问题
letnnn commented 3 months ago

Hello, I used two audio recordings for positioning, both played from the same location. For the first 20 sets of data, I used the first audio, and for the last 10 sets, I used the second audio. However, the positioning results are not the same, and the results from the second audio are much worse. Is this an issue with the audio, or could there be other possible reasons? Please help me with this issue. Thank you!

BrownsugarZeer commented 3 months ago

Are the two results in the same location and environment? For example, the first results has lower environmental noise than the second one, etc. It would be best if you can provide audio files. At present, the experimental configuration and environment description are not obvious.

letnnn commented 3 months ago

Both experiments were conducted in the same environment and at the same location. The only difference is the type of audio used. My first audio is relatively clean, but the second audio contains some noise. I suspect that the lack of purity in the second audio might be the cause of the issue. What do you think?

letnnn commented 3 months ago

I can't send the source audio files, but below are images of the two audio recordings. The first image is from the first audio, and the second image is from the second audio. 1 2

BrownsugarZeer commented 3 months ago

I used DUET to separate each source and then passed it to the SRP-PHAT algorithm for multi-source tracking. I thought the attenuation and delay of the first audio after DUET will be better than second one, because it has a stable sound output. The second audio can also be separated, but the attenuation and delay will be slightly affected because DUET does not have the ability to resist noise. Attenuation and delay will affect the precision of SRP-PHAT.

letnnn commented 3 months ago

Would it improve the results if we process the audio collected by the microphone array, such as by applying noise reduction, before performing the positioning?

BrownsugarZeer commented 3 months ago

Absolutely. DUET counts the attenuation and delay of each TF-point to calculate the mask of the sound source. The premise of this algorithm is that each TF-point only contributes from a single sound source (in other words, ignores environmental noise).

image

If the SNR is too low, the peak may be misestimated, resulting in attenuation and delay masks to deviate, finally affecting the SRP-PHAT.