Hello! I have been reading your article on this and implementing it myself to learn pytorch. However, I am a bit confused by your loss function. specifically, why do you include the term:
-log(sigmoid(f(q)-f(r)) and not -log(sigmoid(f(r)-f(q))? Don't we want to reward f(r)>f(q) with low loss, and punish f(r)<f(q) with high loss, and not the other way around? Just looking for an explanation so I understand :)
Hello! I have been reading your article on this and implementing it myself to learn pytorch. However, I am a bit confused by your loss function. specifically, why do you include the term:
-log(sigmoid(f(q)-f(r))
and not-log(sigmoid(f(r)-f(q))
? Don't we want to reward f(r)>f(q) with low loss, and punish f(r)<f(q) with high loss, and not the other way around? Just looking for an explanation so I understand :)