How to load all the pre-trained model fastly?

MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings

http://essentia.upf.edu

GNU Affero General Public License v3.0

2.81k stars 525 forks source link

How to load all the pre-trained model fastly? #999

Closed blues-green closed 4 years ago

blues-green commented 4 years ago

essentia-tensorflow done a good job, but it takes a long time to load all of the pre-trained models, I would appreciate it if you could give some suggestions, thanks!

palonso commented 4 years ago

Hi @VickyChing, the only way to load models is through the TensorflowPredict algorithm (also used from the TensorflowPredictMusiCNN and TensorflowPredictVGGish wrappers), and there is no way to speed up the model loading.

That said I have a couple of suggestions:

All our MusiCNN based models are about 3 MB and the VGGish based ones are about 300 MB. While some performance gain should be expected when using the VGGish based models, the MusiCNN ones are way faster to load and require less compute.
Our TensorflowPredict's reset method frees the resources attached to the TensorFlow session but keeps the model in memory. So if you need to process multiple files, the ideal would be to first load the models you need and then iterate through your tracks with those instances. In this scenario, the time required for loading the models should be very small compared to the computation time.

blues-green commented 4 years ago

Hi @VickyChing, the only way to load models is through the TensorflowPredict algorithm (also used from the TensorflowPredictMusiCNN and TensorflowPredictVGGish wrappers), and there is no way to speed up the model loading.

That said I have a couple of suggestions:

All our MusiCNN based models are about 3 MB and the VGGish based ones are about 300 MB. While some performance gain should be expected when using the VGGish based models, the MusiCNN ones are way faster to load and require less compute.

Our TensorflowPredict's reset method frees the resources attached to the TensorFlow session but keeps the model in memory. So if you need to process multiple files, the ideal would be to first load the models you need and then iterate through your tracks with those instances. In this scenario, the time required for loading the models should be very small compared to the computation time.

Thanks for your reply. But when I run the example of Auto-tagging with MusiCNN in Streaming mode through the TensorflowPredict algorithm: https://mtg.github.io/essentia-labs/news/tensorflow/2019/10/19/tensorflow-models-in-essentia/ it appears KeyError: Traceback (most recent call last): File "/home/musicnn_taggging.py", line 97, in for i, l in enumerate(np.mean(pool[output_layer], File "/home/../essentia/lib/python3.7/site-packages/essentia/common.py", line 530, in getitem raise KeyError('no key found named \''+key+'\'') KeyError: "no key found named 'model/Sigmoid'"

Kangli-Xia commented 4 years ago

Hi @pabloEntropia , i want to use all models to predict multiple files, but it is too slow, how can i speed up this process?

palonso commented 4 years ago

Hi @VickyChing, the only way to load models is through the TensorflowPredict algorithm (also used from the TensorflowPredictMusiCNN and TensorflowPredictVGGish wrappers), and there is no way to speed up the model loading. That said I have a couple of suggestions:

All our MusiCNN based models are about 3 MB and the VGGish based ones are about 300 MB. While some performance gain should be expected when using the VGGish based models, the MusiCNN ones are way faster to load and require less compute.

Our TensorflowPredict's reset method frees the resources attached to the TensorFlow session but keeps the model in memory. So if you need to process multiple files, the ideal would be to first load the models you need and then iterate through your tracks with those instances. In this scenario, the time required for loading the models should be very small compared to the computation time.

Thanks for your reply. But when I run the example of Auto-tagging with MusiCNN in Streaming mode through the TensorflowPredict algorithm: https://mtg.github.io/essentia-labs/news/tensorflow/2019/10/19/tensorflow-models-in-essentia/ it appears KeyError: Traceback (most recent call last): File "/home/musicnn_taggging.py", line 97, in for i, l in enumerate(np.mean(pool[output_layer], File "/home/../essentia/lib/python3.7/site-packages/essentia/common.py", line 530, in getitem raise KeyError('no key found named ''+key+''') KeyError: "no key found named 'model/Sigmoid'"

I checked the steps and they are working fine to me. Maybe the audio file you used was too short to produce a patch? What is the output of pool['melbands'].shape ?

palonso commented 4 years ago

Hi @pabloEntropia , i want to use all models to predict multiple files, but it is too slow, how can i speed up this process?

The easiest thing would be to reduce (or remove) the overlap between contiguous patches. To do this check the patchHopSize parameter in TensorflowPredictMusiCNN, TensorflowPredictVGGish or VectorRealToTensor.

I made a quick test processing a 7 minutes Mp3 on a single Titan Xp to give you some taste of the amount of time you can save:

Architecture: musicnn. No-overlap: False. Time: 2.7255349159240723
Architecture: musicnn. No-overlap: True. Time: 2.169588804244995
Architecture: vggish. No-overlap: False. Time: 5.344762802124023
Architecture: vggish. No-overlap: True. Time: 5.083672523498535

Also, if you have a GPU you can try some parallelization.

In TensorflowPredictMusiCNN and TensorflowPredictVGGish we provide the accumulate parameter. When set to true it accumulates patches until it reaches the end of the audio stream and then it tries to run a single TensorFlow session in parallel. However, we don't recommend this functionality for long audio streams as it can blow up your memory or generate too many patches to fit in your GPU. If you instantiate the whole processing chain by yourself (as in here) you have more control over this so you can decide the number of patches to be processed in parallel by setting the batch axis in the shape parameter of VectorRealToTensor to any positive number.

To give you some idea of the impact:

Architecture: musicnn. Accumulate: False. Time: 3.0043346881866455
Architecture: musicnn. Accumulate: True. Time: 2.6006927490234375
Architecture: vggish. Accumulate: False. Time: 5.349377155303955
Architecture: vggish. Accumulate: True. Time: 4.242488861083984

Kangli-Xia commented 4 years ago

Hi @VickyChing, the only way to load models is through the TensorflowPredict algorithm (also used from the TensorflowPredictMusiCNN and TensorflowPredictVGGish wrappers), and there is no way to speed up the model loading. That said I have a couple of suggestions:

All our MusiCNN based models are about 3 MB and the VGGish based ones are about 300 MB. While some performance gain should be expected when using the VGGish based models, the MusiCNN ones are way faster to load and require less compute.

Our TensorflowPredict's reset method frees the resources attached to the TensorFlow session but keeps the model in memory. So if you need to process multiple files, the ideal would be to first load the models you need and then iterate through your tracks with those instances. In this scenario, the time required for loading the models should be very small compared to the computation time.

Thanks for your reply. But when I run the example of Auto-tagging with MusiCNN in Streaming mode through the TensorflowPredict algorithm: https://mtg.github.io/essentia-labs/news/tensorflow/2019/10/19/tensorflow-models-in-essentia/ it appears KeyError: Traceback (most recent call last): File "/home/musicnn_taggging.py", line 97, in for i, l in enumerate(np.mean(pool[output_layer], File "/home/../essentia/lib/python3.7/site-packages/essentia/common.py", line 530, in getitem raise KeyError('no key found named ''+key+''') KeyError: "no key found named 'model/Sigmoid'"

I checked the steps and they are working fine to me. Maybe the audio file you used was too short to produce a patch? What is the output of pool['melbands'].shape ?

i meet the same problem. my pool['melbands'].shape is (14992, 96), the file is a 4 minutes mp3

palonso commented 4 years ago

Then your mel-band computation is fine. It is difficult to tell more without seeing your code. Could you post it?

Kangli-Xia commented 4 years ago

The code is not left now ,but i find the reason maybe.The model i use is genre_rosamerica-vggish-audioset but I didn't change the parameters such as frameSize or numberBands. When i use standard mode, model tell me the size of input is wrong.

Model is work now, thank you~

blues-green commented 4 years ago

Sorry, there's still an issue to be resolved. For now, we have separate feature extraction and model prediction process by using TensorflowInputVGGish/TensorflowInputMusiCNN and TensorflowPredict, the prediction result is close to TensorflowPredictVGGish/TensorflowPredictMusiCNN, but it is not the same, we guess it is due to the different parameter setting, could you tell us the parameter values like frameSize, hopSize, patchSize, numberBands? Thank you very much!

palonso commented 4 years ago

Sorry, there's still an issue to be resolved. For now, we have separate feature extraction and model prediction process by using TensorflowInputVGGish/TensorflowInputMusiCNN and TensorflowPredict, the prediction result is close to TensorflowPredictVGGish/TensorflowPredictMusiCNN, but it is not the same, we guess it is due to the different parameter setting, could you tell us the parameter values like frameSize, hopSize, patchSize, numberBands? Thank you very much!

Answered at #1000