Closed highfiiv closed 1 year ago
This demo could be a starting point for real-time use-case (you could also check this tutorial).
For this particular mood-classifier demo, the predictions are averaged (check here). Model predicts features for every corresponding segment of audio input. ie. It compute mel-spectrogram (refer) for for every frame of audio with a hop-size (if specified) then it feeds to the model for inference, which will return predictions for corresponding to each frame. Note: the audio input is downsampled to 16KHz sample rate, so take that in account when you convert the frames into corresponding timestamps.
Hope that helps.
What is the issue about?
Description
I've been researching for how to use Essentia.JS for real-time mood scores but cannot find any direction on this. More specifically, the mood classifier example shows overall "aggressiveness" score. However, what about real-time aggressiveness for each moment of the track.
Being able to know which parts of a track is "most" or "least" danceable, aggressive etc. seems a lot more useful than general classification.
Can I get some direction on this usage?