MTG / essentia.js

JavaScript library for music/audio analysis and processing powered by Essentia WebAssembly
https://essentia.upf.edu/essentiajs
GNU Affero General Public License v3.0
627 stars 41 forks source link

How to import ML models ad use for autotagging #134

Open bianchilo opened 5 months ago

bianchilo commented 5 months ago

What is the issue about?

What part(s) of Essentia.js is involved?

Description

Hello everyone, I am trying to adapt the Real-time music autotagging with MusicCNN example using a different machine learning model among those published on Essentia ( https://essentia.upf.edu/models/ ) . Target is recognizing musical instruments in realtime. I chose mtg_jamendo_instrument-discogs-effnet-1.pb because it has more musical instruments. I converted it to TensorFlow format using tensorflowjs-converter, and now I have the problem of handling a different feature input required by this model.

The model used in the example I was modifying had the following input configuration:

"inputs": [ { "name": "model/Placeholder", "type": "float", "shape": [ 187, 96 ] } ] and it performs inference with "algorithm": "TensorflowPredictMusiCNN"

However, the model I would like to use now has the following input configuration:

"inputs": [ { "name": "model/Placeholder", "type": "float", "shape": [ 1280 ] } ] and it performs inference with "algorithm": "TensorflowPredict2D"

So, at the very least, I need to change the FeatureExtractProcessor. Is there any place where I can find an example that suits my case or detailed information on how to do this? I haven't found anything in the documentation that helps me understand what I need to change in the code. Any suggestions are welcome. Thank you in advance.

Steps to reproduce / Code snippets / Screenshots

-

System info

Chromium based browser, Essentia.js

albincorreya commented 5 months ago

These new models based on effnet-discogs has a different signal flow than older generation models. From the python docs and examples, you can see the following process

audio input -> embeddings -> activations (tags)

There are two models for inference, ie. one to compute embeddings (vector representation) and another one to compute tags from these vector representations.

For making this work in JS, the following signal chain has to be added to essentia.js-model lib.

  1. audio -> embeddings
  2. embeddings -> tags