EmNudge / Spectrogram-Replicator

Educational Linguistics tool for exploring Spectrograms
https://spectrogram.emnudge.dev
6 stars 2 forks source link

Wasm Spectrogram Generation #19

Open EmNudge opened 3 years ago

EmNudge commented 3 years ago

One of the main problems is a disconnect between the default settings and the image provided.

The best option here is to have spectrogram generation built right into the web app. Someone should be able to upload an audio file and have the application generate the diagram and change the settings to reflect the file. It should scale the time appropriately.

This has been looked into for a while. At first, I looked for JavaScript-based spectrogram generators, but just about all of them use the Web Audio API. I'm already using this, so it'd be trivial to implement it myself. However, Web Audio cannot analyze audio ahead of time. It only does live-audio decoding. This means it's theoretically possible, but any operation on audio must take at least however long the audio file is. A 4-second recording must therefore take 4 seconds. For larger files, it gets proportionally worse.

This seems like a very good place for Web Assembly. Any JS solution over large array buffers would be slower than it could. If we're dealing with buffers anyway, WASM is a good fit.

The problem now is how to do it. Rust is the obvious choice, but crates are a problem. There are crates like sonogram and spectrogram, but they have limited options and require hound, creating a thick and nonmalleable package.

The best option I've found is a rust library being used as a python package. It generates the spectrogram itself, but not the fft, which it uses rustfft for. I'd love to just use it and call it a day, but it's specifically made for use in python. It also uses hound, so I'll have to tweak it to take in a buffer directly.

Meaning my best course is to get a spectrogram prototype going that takes in an array buffer and outputs image data that I can draw to canvas.

EmNudge commented 3 years ago

I've recently been made aware of OfflineAudioContext from an article on fingerprinting using the audio API.

One of the main reasons I was avoiding the Web Audio API was because we had to wait in real-time for processing to conclude, so a 3-second clip must necessarily take 3 seconds to process. WASM could both run on another thread and work faster than real-time.

The "offline" version of AudioContext lets us run faster than real-time while still leveraging many parts of the API that are already built for us. It may also be just as fast as the WASM version with the main downside being that it's not available in web workers. This may mean it potentially blocks the main thread while processing, but as long as the clip is short enough, I don't think users will mind too much.

WASM, while it gives us fine-grained control and lets us run on a separate thread, is going to be a headache to maintain. The code is no longer JS, it involves using FFT libraries, and the project layout and bundling will grow by a fair bit. This is likely the optimal path forward.