MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.8k stars 525 forks source link

Probabilistic Yin and CREPE #788

Closed ronggong closed 2 years ago

ronggong commented 5 years ago

As the monophonic pitch extraction algorithms in Essentia are out-of-date, it is appealing to implement two state of the art pitch extraction algorithms which lead to better pitch extraction accuracy:

dbogdanov commented 5 years ago

For now, we should prioritize PYin over CREPE. Btw, can you provide some links with evaluation of PYin vs Yin?

ronggong commented 5 years ago

For the comparison between Yin and Pyin, please check the figure. 3 in the paper: PYIN: A FUNDAMENTAL FREQUENCY ESTIMATOR USING PROBABILISTIC THRESHOLD DISTRIBUTIONS http://matthiasmauch.de/_pdf/mauch_pyin_2014.pdf

sankalpg commented 5 years ago

Hello, is there any rough ETA for this feature request ? Thanks a lot!

dbogdanov commented 5 years ago

@ronggong was working on that, perhaps he can provide some details.

ronggong commented 5 years ago

@sankalpg Hi Sankalp, it's not done yet. I will work on it next weekend and try to finish it.

dbogdanov commented 5 years ago

Updates? ;)

ronggong commented 5 years ago

The algorithm gets correct pitch estimation, but it's way too slow than Vamp plugin version. I will go to BCN tomorrow and talk with Pablo.

ronggong commented 5 years ago

Pull-request of PYIN sent https://github.com/MTG/essentia/pull/809#issue-240674967

ronggong commented 5 years ago

@pabloEntropia what is the status of the tensorflow integration? We need to evaluate if it is still worth to put CREPE into essentia. I guess the criteria should be (1) The processing speech of CREPE (2) Do you think people will be interested in using Essentia version of CREPE than using its original implementation? PYIN has the problem because it is officially only available for Vamp.

dbogdanov commented 5 years ago

Yes, I think we should add Crepe too. There's a pending pull request (#802) that adds support for TensorFlow.

palonso commented 5 years ago

Yes, let's do this! #802 contains a set of algorithms designed to inference from Tensorflow models inside Essentia. Right now we are waiting to have a bunch of models trained on Essentia features before the release. In order to put DL models inside Essentia the steps would be:

  1. Train a Tensorflow model as usual. If you need audio features, use Essentia for the extraction. Using the streaming mode would be better because this way you are making sure that the signal flow would be exactly the same on inference time.
  2. Ask me how to put the model into the required format as we don't have official documentation for this yet.
ronggong commented 5 years ago

@pabloEntropia The issue I want to ask is that is it worth to re-implement CREPE in essentia? My guess is people will go to directly to the original implementation https://github.com/marl/crepe. I can't think of a reason why people wants to install the large essentia instead of just running the small CREPE?

ucasiggcas commented 5 years ago

Dear, Is there any test on Pyin with language C++, I just want to test the Pyin function, but it was difficult, Who can help me, please? Thanks a lot.

ucasiggcas commented 5 years ago

Could I use Pyin with Essentia in C++, and how? Thanks

justinsalamon commented 5 years ago

Hey folks, if you finally do decide to incorporate CREPE into Essentia feel free to hit us up! :)

dbogdanov commented 5 years ago

Thanks, @justinsalamon! This is not our priority at the moment, therefore closing this issue for now.

dbogdanov commented 3 years ago

Reopening this issue. @palonso, let's review what is missing to add CREPE.

justinsalamon commented 3 years ago

In case it's helpful, here are the evaluation results for CREPE vs pYIN:

Screen Shot 2021-03-24 at 1 39 15 PM Screen Shot 2021-03-24 at 1 39 23 PM
justinsalamon commented 3 years ago

Another important thing to note is that we changed how we estimate the final pitch compared to the paper in two important ways:

1. Argmax-local Weighted Averaging

This release of CREPE uses the following weighted averaging formula, which is slightly different from the paper. This only focuses on the neighborhood around the maximum activation, which is shown to further improve the pitch accuracy:

2. Temporal smoothing

By default CREPE does not apply temporal smoothing to the pitch curve, but Viterbi smoothing is supported via the optional --viterbi command line argument.

justinsalamon commented 3 years ago

Finally note that we provide several CREPE models of different sizes so you can choose a desired speed/accuracy tradeoff:

Model Capacity

CREPE uses the model size that was reported in the paper by default, but can optionally use a smaller model for computation speed, at the cost of slightly lower accuracy. You can specify --model-capacity {tiny|small|medium|large|full} as the command line option to select a model with desired capacity.

Here's a benchmark we ran, with the number of parameters in parentheses:

crepe_performance_table

Hope this helps!

palonso commented 3 years ago

@justinsalamon thanks a lot for the explanation!

Now that TensorFlow inference is fully supported it shouldn't take much effort to integrate CREPE. @dbogdanov I can start working on it.