MTG / essentia-replicate-demos

Demos of Essentia models hosted on Replicate.com
GNU Affero General Public License v3.0
40 stars 6 forks source link

Google colab notebooks for the demos ? #5

Open timtensor opened 1 year ago

timtensor commented 1 year ago

Hi , I am currently looking into higher level feature extraction from an audio signal such as genre, mood ,danceablity as a colab / jupyter notebook. Is there an example of it that one can refer to and try it ?

palonso commented 1 year ago

Hi @timtensor, you can have a look at Essentia models. It contains feature extraction example scripts for all our models.

timtensor commented 1 year ago

Thanks for pointing it out. I think there is problem with installation of essentia-tensorflow I get the following error

I did the installation using pypi - !pip install essentia-tensorflow while the pip version is pip 22.0.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

palonso commented 1 year ago

I think you missed the error message. Could you also mention your OS?

pmahan00 commented 1 year ago

Sorry for the incomplete information. The following is the error message . I am running it in google colab so i guess its ubuntu based


<ipython-input-38-96cbcf823c6c> in <module>
----> 1 from essentia.standard import MonoLoader, TensorflowPredictMusiCNN

ImportError: cannot import name 'TensorflowPredictMusiCNN' from 'essentia.standard' (/usr/local/lib/python3.8/dist-packages/essentia/standard.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

pmahan00 commented 1 year ago

Just an update, it seems work on google colab when i have the following

!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow

I have two questions on the prediction model a) Is it not possible to load the pre trained model from google drive . I mounted my drive and tried to point the graph file name as such /mnt/gdrive/xxxx but it resulted in an error b) I am bit confused about the outcome ? from the embeddings i get a matrix of values but is there a decoding step as well ?

Sample code run on google colab

!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow

from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs

audio = MonoLoader(filename=audioFile, sampleRate=16000)()
model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb",output = "model/dense/BiasAdd")
predictions = model(audio)
print(predictions)

Perhaps i am doing something wrong in the code ?

palonso commented 1 year ago

Glad to see that you could install and use the models!

regarding a), it is not related to Essentia, so I'd recommend to look for help somewhere else. Alternatively, you could directly download the models in the Colab, e.g., adding !curl -SLO https://essentia.upf.edu/models/autotagging/msd/msd-musicnn-1.pb to your script.

about b), you are right, the embeddings are not human-readable and need to be input to a classification head to get the class probabilities. Note that clicking on each model from the web you will get the example script to get the predictions and links to the model weights, and metadata file. For example, this is the script to do inference with the danceability-msd-musicnn model on top of the embeddings you already extracted:

from essentia.standard import MonoLoader, TensorflowPredictMusiCNN, TensorflowPredict2D

audio = MonoLoader(filename="audio.wav", sampleRate=16000)()
embedding_model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb", output="model/dense/BiasAdd")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="danceability-msd-musicnn-1.pb", output="model/Softmax")
predictions = model(embeddings)

predictions will be a matrix [time_stamp, n_classes] because this model makes a prediction each 1.5 seconds of audio. To get track-level predictions, you can average the matrix across the time axis.

pmahan00 commented 1 year ago

Thanks for the curl tip . I totally had forgotten about it . I guess all the models are under here https://essentia.upf.edu/models/

I didnt quite understand the human readable , explanation on track level. For example i was looking into a track level classification of pre-trained SVM Gaia models to learn about it. Is there a python code example that can help me to get classification based on SVM model or a code snippet to experiment with . Model link :https://essentia.upf.edu/svm_models/

dbogdanov commented 1 year ago

Hi @pmahan00.

To get overall track predictions, you can simply average the resulting matrix of activations across time similar to this example.

Note that SVM classifiers are outdated in terms of their accuracy and generalization, and we recommend using the new models instead.