MTG / essentia.js

JavaScript library for music/audio analysis and processing powered by Essentia WebAssembly
https://essentia.upf.edu/essentiajs
GNU Affero General Public License v3.0
627 stars 41 forks source link

Real-time mood classification - "most" vs "least" danceable part of a track #117

Closed highfiiv closed 1 year ago

highfiiv commented 1 year ago

What is the issue about?

Description

I've been researching for how to use Essentia.JS for real-time mood scores but cannot find any direction on this. More specifically, the mood classifier example shows overall "aggressiveness" score. However, what about real-time aggressiveness for each moment of the track.

Being able to know which parts of a track is "most" or "least" danceable, aggressive etc. seems a lot more useful than general classification.

Can I get some direction on this usage?

albincorreya commented 1 year ago

This demo could be a starting point for real-time use-case (you could also check this tutorial).

For this particular mood-classifier demo, the predictions are averaged (check here). Model predicts features for every corresponding segment of audio input. ie. It compute mel-spectrogram (refer) for for every frame of audio with a hop-size (if specified) then it feeds to the model for inference, which will return predictions for corresponding to each frame. Note: the audio input is downsampled to 16KHz sample rate, so take that in account when you convert the frames into corresponding timestamps.

Hope that helps.