Implement likelihood subcommand

I have implemented this feature in 18275ef7d180c5e619bea1080338a652490dcae8.

The results of plotting n_states vs likelihood are very interesting, and have made me doubt the effectiveness of gradient ascent for selecting n_states.

In "Happy Birthday To You", you can see that the likelihood is very low, until we reach 40+ states, after which there is a 100% chance of the model playing the exact input song (this is clear from all of the songs generated by these models)!

happy-birthday-likelihood

In "Twinkle Twinkle Little Star", there is a similar plateau, although it is only at the 60% level. From listening to the generated songs, I believe I have identified why it only reaches 60%. When the song repeats the same melody twice ("up above the world so high" and "like a diamond in the sky" both consist of the same notes), there is in fact a 2-way branch at the end of the verse, which either repeats the verse, or continues to the next verse. This must have been a local maximum reached by Baum-Welch, and the global maximum (100% likelihood) would simply create new states for the second repetition. I'm glad it stopped at the local maximum, though.

The other, very strange result, is that 80 and 90 states are actually far below the otherwise constant likelihood for 50-100 states. This is why I cannot use gradient descent, and instead must do a more extensive search.

twinkle-likelihood

dwysocki / hidden-markov-music

Implement likelihood subcommand #10