Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.
make sure extracting MFCC with essentia same as damp model:
add preempahsis (or recreate model without preemphasis ) add cepstral mean normalization