benfmiller / audalign

Package for aligning audio files through audio fingerprinting
MIT License
94 stars 2 forks source link

Resolving slight offset #33

Closed skinkie closed 2 years ago

skinkie commented 2 years ago

When regording two microphones where one microphone picks up the sound of the other, a slight offset causes nasty echo. Would you be interested in such examples? If yes, how to deliver them?

benfmiller commented 2 years ago

Sure! A link to a Google Drive folder? Fingerprinting has a maximum accuracy of about 0.04 seconds because of the FFT windows. Correlation is much more accurate time-wise with a maximum accuracy of about 0.001 seconds, though it is less robust. There's a description in the wiki

My prefered way of getting the best of both techniques is to do a regular alignment with the fingerprinting technique, then doing a fine alignment with the correlation technique. If you're using the run_align.py script, you can pass in the --fine-align flag

skinkie commented 2 years ago

I think the heuristic approach where you would do a finger print to group files, and do a second pass (if required) makes a lot of sense. I do wonder how you could tacle drift in this approach.

benfmiller commented 2 years ago

Awesome, got them! What do you mean by drift? They make use of a max_lags parameter that defaults to a window size of 2. So the fine alignment loods for the best alignment within 1 second on either side

Handling differences from people moving farther away from microphones is an untackled problem

skinkie commented 2 years ago

What do you mean by drift?

If I have two crystals creating the samplerate, the expectation that they are completely in sync is something you can shoot yourself in the foot with. A recording of for example one hour may start drifting in the second range. Meaning the aligment may be "correct" for the first part, but later the alignment will have moved. This may imply that you need to stretch the audio to sync it up.

Handling differences from people moving farther away from microphones is an untackled problem

That is a lovely one as well. Are you being able to handle phase differences already?

benfmiller commented 2 years ago

That's interesting. I would have thought the timers would be a little more accurate for an hour-long recording. Stretching the audio so that sound events match exactly throughout the audio files seems like a tough task. The locality feature allows for some degree of isolating matches throughout the files, but match confidences are pretty subjective. It's hard to definitively say what the best alignment is in some cases.

What do you mean by handling phase differences? The fingerprinting technique is way too time inaccurate to handle phase differences. The correlation technique is, though, and it matches up the peaks exactly.

skinkie commented 2 years ago

What do you mean by handling phase differences?

If you have a file alignment. How fine can this be? Is it possible to detect that the different recordings are for example n meters apart given that you would have 3 observers and the audio can be described relative to each other?

benfmiller commented 2 years ago

That is theoretically possible, but practically very difficult to do programmatically, though it was one of the original motivators for the project. Fingerprinting is much more robust, but it works with frequency-dependent sound events like voices. Correlation is much less robust and only works with amplitude based sound events, like crashes and pops. I had a lot of trouble getting the alignments reliable enough with most test audio files to meaningfully extract any location information.

There are existing systems that only focus on gunshots that can roughly accomplish this, but I was hoping the mix of techniques could trilaterate more types of audio events.