dipzza / ultrastar-song2txt

Tools that automate parts of making a song in the ultrastar txt format
GNU Affero General Public License v3.0
1 stars 0 forks source link

Application to estimate notes pitches from song audio and write them to ultrastar txt. #40

Closed dipzza closed 2 years ago

dipzza commented 2 years ago

An application that estimates notes pitches from song audio automatically and assigns them to the txt file can significantly help in fulfilling #7.

A CLI application seems like a good first approach to be able to easily test it with different parameters (#10) and improve accuracy / performance.

dipzza commented 2 years ago

The technique to estimate the fundamental frequency of the voice should have the highest accuracy possible, even if it's slow (within reason) taking time of the karaoke song creator manually checking bad predictions is worse.

Classic algorithms, like FFT, PRAAT, YIN, pYIN, STFT, of which pYIN has the highest accuracy ¹²³ are fast but not state of the art. pYIN is already integrated in librosa which we are using for multiple reasons.

Approach using neural networks, like CREPE and SWIPE, require much more computational power but are the ones with greater accuracy and the current state of the art. CREPE outperforms both pYIN and SWIPE by over 8% in all metrics (https://arxiv.org/pdf/1802.06182.pdf) and has received improvements since the paper publication.

Taking this into account CREPE will be used to estimate the fundamental frequencies of audio. Support for pYIN could be considered in the future for a faster and smaller alternative, although CREPE let users easily choose smaller models for faster inference.