Open hpx7 opened 5 years ago
So I guess we can achieve this in a frame-local way by using the learned emission probabilities in the CRF and applying a softmax function:
from madmom.models import CHORDS_CFCRF
from madmom.ml.crf import ConditionalRandomField
from madmom.features import CNNChordFeatureProcessor
import numpy as np
def softmax(x):
return np.exp(x) / sum(np.exp(x))
crf = ConditionalRandomField.load(CHORDS_CFCRF[0])
feats = CNNChordFeatureProcessor()('audio-file.wav')
probabilities = [softmax(np.dot(feat, crf.W)) for feat in feats]
However if we wanted to take the transition probabilities etc into account we would probably need to apply the forward-backward algorithm. Is this something that could be added to the ConditionalRandomField
class?
Ping @fdlm since he is the chord/CRF guy.
Yes, this is correct. If you want to consider transition probabilities, you need the forward-backward algorithm, and yes, it could be added to the ConditionalRandomField
class, if anyone is willing to implement it :smiley:
I took a stab at partial implementation (https://github.com/CPJKU/madmom/compare/master...hpx7:patch-1) but I'm stuck on a few things I haven't been able to figure out yet:
bias
and final
potentialsself.A
) were not normalized. After applying a softmax it started behaving more normally. Am I doing something wrong by normalizing here? Maybe this is a difference between HMMs and CRFs that I'm missing?Sorry for all the questions, I'm just posting them as I work through understanding all this as it's a new area for me.
self.A
is not a transition probability matrix, but a transition potential matrix. Potentials are not normalized in any way. The model behaving "more normally" might be a coincidence, but there is probably another bug in the code.You can check out my CRF code for Theano here: https://github.com/fdlm/Spaghetti/blob/master/spaghetti/layers.py (_get_forward_output_for
)
Maybe it helps you.
Thanks for the pointers @fdlm. I got some time to look into this more.
The only way I could figure out how to normalize at each step was to apply the softmax
function (since we have to account for negative potentials). However I'm not sure how to calculate a normalization constant with softmax and apply it to correct the answer at the end.
I've gotten rid of softmaxing self.A
and am incorporating the bias
and final
potentials. The results are looking mostly reasonable now (comparable to the viterbi output) but I'm not sure what the best way to verify that the implementation is correct or not.
You can see my changes here: https://github.com/CPJKU/madmom/compare/master...hpx7:patch-1. Is there anything you'd like to see prior to me making a PR?
Hey, thanks for your work. The best way to verify that your implementation is correct is to test it against an existing one. Since the chord recognition model you're using was actually trained using the code here: https://github.com/fdlm/Spaghetti/blob/master/spaghetti/layers.py and only later translated to a pure Python/Numpy implementation, this would be the way to go. I understand that it's probably a lot of work, especially to get Theano running again :), but to be honest, I'm not willing to merge code that I'm not 100% sure it's correct. I still believe there's something wrong with the softmaxing, but I don't have time to debug it.
The
CRFChordRecognitionProcessor
uses the Viterbi algorithm to compute the most likely chord sequence at the frame level. I'm wondering if there's a way during decoding to obtain confidence levels per frame about chord possibilities.In particular, is there an easy way to obtain marginal probabilities per chord label from the
CNNChordFeatureProcessor
feature vectors? From a bit of research it looks like the forward-backward algorithm may be useful for this, but I'm not exactly sure how we would apply it here.