Open bianchilo opened 10 months ago
These new models based on effnet-discogs
has a different signal flow than older generation models. From the python docs and examples, you can see the following process
audio input -> embeddings -> activations (tags)
There are two models for inference, ie. one to compute embeddings (vector representation) and another one to compute tags from these vector representations.
For making this work in JS, the following signal chain has to be added to essentia.js-model
lib.
What is the issue about?
What part(s) of Essentia.js is involved?
Description
Hello everyone, I am trying to adapt the Real-time music autotagging with MusicCNN example using a different machine learning model among those published on Essentia ( https://essentia.upf.edu/models/ ) . Target is recognizing musical instruments in realtime. I chose mtg_jamendo_instrument-discogs-effnet-1.pb because it has more musical instruments. I converted it to TensorFlow format using tensorflowjs-converter, and now I have the problem of handling a different feature input required by this model.
The model used in the example I was modifying had the following input configuration:
"inputs": [ { "name": "model/Placeholder", "type": "float", "shape": [ 187, 96 ] } ] and it performs inference with "algorithm": "TensorflowPredictMusiCNN"
However, the model I would like to use now has the following input configuration:
"inputs": [ { "name": "model/Placeholder", "type": "float", "shape": [ 1280 ] } ] and it performs inference with "algorithm": "TensorflowPredict2D"
So, at the very least, I need to change the FeatureExtractProcessor. Is there any place where I can find an example that suits my case or detailed information on how to do this? I haven't found anything in the documentation that helps me understand what I need to change in the code. Any suggestions are welcome. Thank you in advance.
Steps to reproduce / Code snippets / Screenshots
-
System info
Chromium based browser, Essentia.js