Open timtensor opened 1 year ago
Hi @timtensor, you can have a look at Essentia models. It contains feature extraction example scripts for all our models.
Thanks for pointing it out. I think there is problem with installation of essentia-tensorflow
I get the following error
I did the installation using pypi - !pip install essentia-tensorflow
while the pip version is pip 22.0.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
I think you missed the error message. Could you also mention your OS?
Sorry for the incomplete information. The following is the error message . I am running it in google colab so i guess its ubuntu based
<ipython-input-38-96cbcf823c6c> in <module>
----> 1 from essentia.standard import MonoLoader, TensorflowPredictMusiCNN
ImportError: cannot import name 'TensorflowPredictMusiCNN' from 'essentia.standard' (/usr/local/lib/python3.8/dist-packages/essentia/standard.py)
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
Just an update, it seems work on google colab when i have the following
!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow
I have two questions on the prediction model
a) Is it not possible to load the pre trained model from google drive . I mounted my drive and tried to point the graph file name as such /mnt/gdrive/xxxx
but it resulted in an error
b) I am bit confused about the outcome ? from the embeddings i get a matrix of values but is there a decoding step as well ?
Sample code run on google colab
!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow
from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs
audio = MonoLoader(filename=audioFile, sampleRate=16000)()
model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb",output = "model/dense/BiasAdd")
predictions = model(audio)
print(predictions)
Perhaps i am doing something wrong in the code ?
Glad to see that you could install and use the models!
regarding a), it is not related to Essentia, so I'd recommend to look for help somewhere else. Alternatively, you could directly download the models in the Colab, e.g., adding !curl -SLO https://essentia.upf.edu/models/autotagging/msd/msd-musicnn-1.pb
to your script.
about b), you are right, the embeddings are not human-readable and need to be input to a classification head to get the class probabilities.
Note that clicking on each model from the web you will get the example script to get the predictions and links to the model weights, and metadata file. For example, this is the script to do inference with the danceability-msd-musicnn
model on top of the embeddings you already extracted:
from essentia.standard import MonoLoader, TensorflowPredictMusiCNN, TensorflowPredict2D
audio = MonoLoader(filename="audio.wav", sampleRate=16000)()
embedding_model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb", output="model/dense/BiasAdd")
embeddings = embedding_model(audio)
model = TensorflowPredict2D(graphFilename="danceability-msd-musicnn-1.pb", output="model/Softmax")
predictions = model(embeddings)
predictions
will be a matrix [time_stamp, n_classes] because this model makes a prediction each 1.5 seconds of audio. To get track-level predictions, you can average the matrix across the time axis.
Thanks for the curl tip . I totally had forgotten about it . I guess all the models are under here
https://essentia.upf.edu/models/
I didnt quite understand the human readable , explanation on track level. For example i was looking into a track level classification of pre-trained SVM Gaia models to learn about it. Is there a python code example that can help me to get classification based on SVM model or a code snippet to experiment with .
Model link :https://essentia.upf.edu/svm_models/
Hi @pmahan00.
To get overall track predictions, you can simply average the resulting matrix of activations across time similar to this example.
Note that SVM classifiers are outdated in terms of their accuracy and generalization, and we recommend using the new models instead.
Hi , I am currently looking into higher level feature extraction from an audio signal such as
genre, mood ,danceablity
as a colab / jupyter notebook. Is there an example of it that one can refer to and try it ?