Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.
make sure extracting MFCC with essentia same as damp model:
dont use scikit learn at all, keep LyricsWIthModelsGMM class for chinese.