JorenSix / SyncSink.wasm

A browser based media sync tool via audio-to-audio alignment
GNU Affero General Public License v3.0
6 stars 1 forks source link

h1. SyncSink.wasm

SyncSink.wasm is webapplication to synchronize media files with shared audio. SyncSink matches and aligns shared audio and determines offsets in seconds. With these precise offsets it becomes trival to sync media files.

SyncSink is, for example, used to synchronize video files: when you have many video captures of the same event, the audio attached to these video captures is used to align and sync multiple (independently operated) cameras. Evidently, SyncSink can also synchronize audio captured from many (independent) microphones if some environmental sound is shared the recordings.

!./media/media_sync_recording.apng(Synchronizing audio with the SyncSink user interface)! Fig. Synchronizing audio with the SyncSink user interface.

SyncSink.wasm is based on the Java SyncSink software. SyncSink can also be used for synchronization of data-streams. For those applications please see the article titled "Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment":http://0110.be/posts/Synchronizing_Multimodal_Recordings_Using_Audio-To-Audio_Alignment_-_In_Journal_on_Multimodal_User_Interfaces.

h3(#sync). How does SyncSink.wasm synchronize media files?

The first step is to extract, downmix and resample audio from the incoming media file. This is done with a wasm version of "ffmpeg":https://ffmpeg.org/. If a video file with multiple audio streams enters the system, the first audio stream is used for synchronization.

The next step is to extract fingerprints. These fingerprints are extracted from the reference media file and all other media files. By aligning the fingerprints of the other files with the reference file, a rough offset is determined. The rough offset determines how much each 'other' file needs to shift to match the reference. The rough offset is accurate to about 8ms.

The last step improves the rough offset by calculating the crosscovariance between the refrence and other files. Since we already have a rough offset we know where audio is likely to match so we can reduce the amount of crosscovariance calculations, which are computationally intensive. In the ideal case the crosscovariance is stable and improves the offset up to audio-sample accuracy.

h3(#syncsink). How to use SyncSink.wasm in the browser

To use go to the "SyncSink.wasm website":https://0110.be/attachment/cors/sync/sync.html and drag and drop your media files. Similarly to the screencapture above. If the same audio is found in the various media files a timebox plot appears with a calculated offset. The JSON file provides more insights in the found matches.

h3(#syncsink). How to use SyncSink.wasm in Node

There is also a command line version of SyncSink.wasm. To see an example go to the @examples/node@ directory and call the following from the command line.

@node --no-experimental-fetch sync.js@

h2(#read). Further Reading

Some relevant reading material concerning SyncSink.wasm.

Six, Joren and Leman, Marc "Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment" (2015)

Six, Joren and Leman, Marc "Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification":http://www.terasoft.com.tw/conf/ismir2014/proceedings/T048_122_Paper.pdf (2014)

Wang, Avery L. An Industrial-Strength Audio Search Algorithm (2003)

Ellis, Dan and Whitman, Brian and Porter, Alastair Echoprint - An Open Music Identification Service (2011)

Sonnleitner, Reinhard and Widmer, Gerhard Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling (2014)

h2(#credits). Credits

The SyncSink.wasm software was developed at "IPEM, Ghent University":http://www.ipem.ugent.be/ by Joren Six.

If you use the synchronization algorithms for research purposes, please cite the following work:

bc. @article{six2015multimodal, author = {Joren Six and Marc Leman}, title = {{Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment}}, issn = {1783-7677}, volume = {9}, number = {3}, pages = {223-229}, doi = {10.1007/s12193-015-0196-1}, journal = {{Journal of Multimodal User Interfaces}}, publisher = {Springer Berlin Heidelberg}, year = 2015 }