Closed skinkie closed 2 years ago
Sure! A link to a Google Drive folder? Fingerprinting has a maximum accuracy of about 0.04 seconds because of the FFT windows. Correlation is much more accurate time-wise with a maximum accuracy of about 0.001 seconds, though it is less robust. There's a description in the wiki
My prefered way of getting the best of both techniques is to do a regular alignment with the fingerprinting technique, then doing a fine
alignment with the correlation technique. If you're using the run_align.py
script, you can pass in the --fine-align
flag
I think the heuristic approach where you would do a finger print to group files, and do a second pass (if required) makes a lot of sense. I do wonder how you could tacle drift in this approach.
Awesome, got them!
What do you mean by drift? They make use of a max_lags
parameter that defaults to a window size of 2. So the fine alignment loods for the best alignment within 1 second on either side
Handling differences from people moving farther away from microphones is an untackled problem
What do you mean by drift?
If I have two crystals creating the samplerate, the expectation that they are completely in sync is something you can shoot yourself in the foot with. A recording of for example one hour may start drifting in the second range. Meaning the aligment may be "correct" for the first part, but later the alignment will have moved. This may imply that you need to stretch the audio to sync it up.
Handling differences from people moving farther away from microphones is an untackled problem
That is a lovely one as well. Are you being able to handle phase differences already?
That's interesting. I would have thought the timers would be a little more accurate for an hour-long recording. Stretching the audio so that sound events match exactly throughout the audio files seems like a tough task. The locality feature allows for some degree of isolating matches throughout the files, but match confidences are pretty subjective. It's hard to definitively say what the best alignment is in some cases.
What do you mean by handling phase differences? The fingerprinting technique is way too time inaccurate to handle phase differences. The correlation technique is, though, and it matches up the peaks exactly.
What do you mean by handling phase differences?
If you have a file alignment. How fine can this be? Is it possible to detect that the different recordings are for example n meters apart given that you would have 3 observers and the audio can be described relative to each other?
That is theoretically possible, but practically very difficult to do programmatically, though it was one of the original motivators for the project. Fingerprinting is much more robust, but it works with frequency-dependent sound events like voices. Correlation is much less robust and only works with amplitude based sound events, like crashes and pops. I had a lot of trouble getting the alignments reliable enough with most test audio files to meaningfully extract any location information.
There are existing systems that only focus on gunshots that can roughly accomplish this, but I was hoping the mix of techniques could trilaterate more types of audio events.
When regording two microphones where one microphone picks up the sound of the other, a slight offset causes nasty echo. Would you be interested in such examples? If yes, how to deliver them?