Real-time mood classification - "most" vs "least" danceable part of a track

MTG / essentia.js

JavaScript library for music/audio analysis and processing powered by Essentia WebAssembly

GNU Affero General Public License v3.0

627 stars 41 forks source link

Description

I've been researching for how to use Essentia.JS for real-time mood scores but cannot find any direction on this. More specifically, the mood classifier example shows overall "aggressiveness" score. However, what about real-time aggressiveness for each moment of the track.

Being able to know which parts of a track is "most" or "least" danceable, aggressive etc. seems a lot more useful than general classification.

Can I get some direction on this usage?

This demo could be a starting point for real-time use-case (you could also check this tutorial).

For this particular mood-classifier demo, the predictions are averaged (check here). Model predicts features for every corresponding segment of audio input. ie. It compute mel-spectrogram (refer) for for every frame of audio with a hop-size (if specified) then it feeds to the model for inference, which will return predictions for corresponding to each frame. Note: the audio input is downsampled to 16KHz sample rate, so take that in account when you convert the frames into corresponding timestamps.

Hope that helps.

MTG / essentia.js

Real-time mood classification - "most" vs "least" danceable part of a track #117

What is the issue about?

Description