MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.77k stars 525 forks source link

Bulk Inference #1268

Closed piyp791 closed 1 year ago

piyp791 commented 1 year ago

Hello,

I have a set of 100,000 mp3 files which I want to run the effnet-discogs model inference on. Can you suggest anything to help speed up the inference ?

I am using ThreadPool right now for parallel inference. Is this approach okay, or can this be corrected/ optimized further ?

pool = ThreadPool(960)

def inference_for_audio(audio_file):
    audio = MonoLoader(filename=audio_file, sampleRate=16000)()
    activations = model(audio)
    print(activations)

model = TensorflowPredictEffnetDiscogs(graphFilename="discogs-effnet-bs64-1.pb")
audio_files = glob.glob("/home/research/Songs/test/*.wav")
results = pool.map(inference_for_audio, audio_files)

Any help would be appreciated. Thanks!

palonso commented 1 year ago

Hi @piyp791, please consider that Essentia algorithms are not thread-safe, so you should use process-based parallelization.

The neural-network inference is the most computationally expensive part in your script. This could be speed up with GPU parallelization, which is internally implemented in Essentia and it is automatically enabled when:

  1. There is a CUDA-capable GPU installed in the system
  2. The CUDA and CuDNN libraries are installed and visible by Essentia. The current Essentia pip package requires CUDA 11.2 and CuDNN 8.1

Note that in this case every process blocks a GPU, so don't use more processes than available GPUs.