[SUGGESTION] NN & Video synch

MarcoRavich commented 3 years ago

Hi there, audalign is very cool !

We would suggest to keep in consideration some interesting fingerprint projects in order to evolve it even more:

FingerprintDNN by @carlmoore256 - Fast pitch detection using a deep neural network
neural-audio-fp by @mimbres - Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
neuralfp by @chymaera96 - Audio Fingerprinter
pfann by @stdio2016 - Neural Network Audio FingerPrint

Please check out AudioAlign - a tool written for research purposes to automatically synchronize audio and video recordings that have either been recorded in parallel at the same event or contain the same aural information - by @protyposis too, wich have a very cool advanced GUI.

Hope that inspires !

Johndirr commented 3 years ago

I can only second AudioAlign, the Alignment Graph of the tool is such a cool feature.

benfmiller commented 3 years ago

Thanks for the heads up! I will definitely plan on incorporating this with AudioAlign. That does seem nifty!

That looks like a useful technique! Hopefully, NN would be able to handle non-tonal audio better, too.

MarcoRavich commented 3 years ago

Hi there, we're glad our suggestions inspired you !

Anyway @protyposis's AudioAlign is based on his own .NET Aurio library, so we believe that first of all it would be great to "put togheter" (best ?) audio fingerprinting techniques into a platform-indipendent library....

...we do also decided to collect (open) audio fingerprinting resources into a specific page under our HyMPS project: https://github.com/forart/HyMPS/blob/main/Fingerprinting.md

benfmiller commented 3 years ago

What do you mean by platform-independent library? Like separate the techniques into a separate python project? Rewrite them in a different language? The non-fingerprinting techniques can be highly effective as well.

That seems like a neat project!

MarcoRavich commented 3 years ago

Well, to get a precise alignment, accurate fingerprinting seems essential (even if your software uses a different approach) AFAIK.

Aurio library, for exampe, implements various audio fingerprinting methods that allows AudioAlign to be more accurate. BTW since it's .NET backended is NOT platform-indipendent, so it's basically not usable on any OS.

Looking at the main open source AV libraries, they are mainly developed in C++ just to make them independent from (so usable in) any platform.

However we understand that asking to "put togeter" a C++ audio fingerprint library "from scratch" is perhaps a bit too ambitious, even if there are some ready-made libs such as:

https://github.com/acoustid/chromaprint
https://github.com/JiahuiYu/audio_recognition
https://github.com/salsowelim/dejavu_cpp_port ...and many others.

We're going/trying to stimulate a collaboration between those libs projects, but a multiplatform "GUI" counterpart needed (seme as AudioAlign or your audalign) to exploit its potential.

Hope that inspires.

benfmiller commented 3 years ago

The biggest drawback to fingerprinting is that it is based on finding peaks in the spectrogram, so audio with non-tonal sound events (footsteps, cars, crashes, doors closing) don't show up well. The fingerprints of those events are fairly random. Types of correlation techniques can help a lot with those. Plus, fingerprints are only as accurate time-wise as the FFT window, which is usually around 0.04 seconds, compared to correlation's 0.001 seconds. I've found that a combination of correlation and fingerprints usually works best with those types of audio events.

I've considered a rust port of audalign, if that would be of interest?

I've also been looking into flutter, which is very cross platform. Would you have any interest in a flutter application that could call those C++ libraries? Kind of in the style of AudioAlign?

MarcoRavich commented 3 years ago

The biggest drawback to fingerprinting is that it is based on finding peaks in the spectrogram, so audio with non-tonal sound events (footsteps, cars, crashes, doors closing) don't show up well. The fingerprints of those events are fairly random. Types of correlation techniques can help a lot with those. Plus, fingerprints are only as accurate time-wise as the FFT window, which is usually around 0.04 seconds, compared to correlation's 0.001 seconds. I've found that a combination of correlation and fingerprints usually works best with those types of audio events.

Well, multiple kind of alignment approaches (not only strictly-audio fingerprinting) can achieve optimal results, of course. A lib - something similar to Aurio - that collects them (all ?) would be great.

I've considered a rust port of audalign, if that would be of interest?

It could be very interesting to let 3rd party sw (DAWs, NLEs, etc) exploit it, so yes it would be cool.

I've also been looking into flutter, which is very cross platform. Would you have any interest in a flutter application that could call those C++ libraries? Kind of in the style of AudioAlign?

As said, alignment algorithms and GUIs to exploit them should be independent of each other ideally.

This approach could also allows devs to collaborate - and evolve - better, in our opinion.

Thanks for your sake in this discussion !

MarcoRavich commented 2 years ago

News: seems that @lutzray's (hw) synch solution is working...

MarcoRavich commented 11 months ago

Bump.

After 5 years, @protyposis released new versions of both AudioAlign and Aurio (with prebuilt Windows binaries):

https://github.com/protyposis/AudioAlign/releases https://github.com/protyposis/Aurio/releases

Enjoy !

MarcoRavich commented 9 months ago

Bump 2024.

There are many interesting improvements both in recent Aurio and AudioAlign releases,

Some end users' (like us) tests have detected malfunctions, omissions and possible improvements to fix/add, but it would be even more interesting to have some "alignment experts" - like you - feedbacks/opinions too:

Thanks in advance !

benfmiller / audalign

[SUGGESTION] NN & Video synch #24