MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.84k stars 530 forks source link

Poor results #1442

Open johnbuts opened 1 week ago

johnbuts commented 1 week ago

Hey everyone, thanks in advanced for the help.

So I wanted to use some of the instrument detection models, and was not impressed by the results. I fed it a wav file that just had saxophone playing for around a minute and 10 seconds. Here is the code and output I got:

` from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs, TensorflowPredict2D import pandas as pd

audio = MonoLoader(filename="other_sax.wav", sampleRate=75000, resampleQuality=4)() embedding_model = TensorflowPredictEffnetDiscogs(graphFilename="discogs-effnet-bs64-1.pb", output="PartitionedCall:1") embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="mtg_jamendo_instrument-discogs-effnet-1.pb") predictions = model(embeddings)

instruments = [ 'accordion', 'acousticbassguitar', 'acousticguitar', 'bass', 'beat', 'bell', 'bongo', 'brass', 'cello', 'clarinet', 'classicalguitar', 'computer', 'doublebass', 'drummachine', 'drums', 'electricguitar', 'electricpiano', 'flute', 'guitar', 'harmonica', 'harp', 'horn', 'keyboard', 'oboe', 'orchestra', 'organ', 'pad', 'percussion', 'piano', 'pipeorgan', 'rhodes', 'sampler', 'saxophone', 'strings', 'synthesizer', 'trombone', 'trumpet', 'viola', 'violin', 'voice' ]

df = pd.DataFrame(predictions, columns=instruments) instrument_sums = df.sum()

top_5_instruments = instrument_sums.sort_values(ascending=False).head(5)

print(top_5_instruments) `

output: synthesizer 218.628510 piano 175.836365 drums 113.429985 cello 85.704750 flute 83.436066

Please tell me what I'm doing wrong, thanks.

palonso commented 1 week ago

Hi @johnbuts The problem with your script is that MonoLoader's sampleRate parameter should match the model's expected sample rate (16000).

johnbuts commented 1 week ago

drums 53.502373 bass 42.751842 electricguitar 40.419640 piano 34.878933 guitar 32.601994

that didn't seem to help, been toying around with the sample rate, nothing really seems to help it. Is maybe my code wrong? like the order of the instrumens or something?

johnbuts commented 1 week ago

Its like no matter the instrument, its like always really high on sythesizer and piano, like ill put in violin and get this: synthesizer 70.941162 piano 69.683495 drums 65.955116 electricguitar 63.488628 guitar 62.260086

and then ill put in a saxophone track and get this: drums 73.532722 synthesizer 72.120262 piano 70.608063 bass 69.452019 electricguitar 66.740585

csipapicsa commented 21 hours ago

Have your tried to set the resampleQuality=0

resampleQuality (integer ∈ [0, 4], default = 1) : the resampling quality, 0 for best quality, 4 for fast linear approximation