Closed tdeboissiere closed 7 years ago
Boy I wish I'd heard from you when I was struggling to get the gradient right!
I'm not sure I completely follow. Are you saying that you think I have an extra factor of 2 in the gradient? Or are you saying I'm missing a factor of 2? Are you saying that you come up with different code than I do, or are you saying that you tested your code and my code and they produce different answers?
If its the former, take a look at the file largeVis.h
, and the initialization of alphagamma
around line 193: https://github.com/elbamos/largeVis/blob/master/src/largeVis.h#L193
class AlphaGradient: public Gradient {
const coordinatetype alpha;
const coordinatetype twoalpha;
protected:
const coordinatetype alphagamma;
virtual void _positiveGradient(const double dist_squared,
coordinatetype* holder) const;
virtual void _negativeGradient(const double dist_squared,
coordinatetype* holder) const;
public:
AlphaGradient(const distancetype a,
const distancetype g,
const dimidxtype D) : Gradient(g, D),
alpha{a},
twoalpha(alpha * -2),
alphagamma(alpha * gamma * 2) { } ; // <--- HERE
};
class AlphaOneGradient: public AlphaGradient {
Does that clear it up?
Yes it is clear now (the factor two I mentioned is included in alphagamma) so everything is in order !
However, I think it would be clearer if you defined alphagamma as alpha * gamma and move the 2 to the gradient code. (Unless you had another reason to do so).
Whew! You had me worried for a minute!
Actually for the last two hours I've been running plots with different settings (1, 2, 4) of that constant for mnist and it seems to actually matter very little. I suppose the effect is very similar to what happens by varying M, controlling the total relative weight of the negative and positive samples in the total gradient.
On Sep 1, 2016, at 1:47 AM, Thibault de Boissiere notifications@github.com wrote:
Yes it is clear now (the factor two I mentioned is included in alphagamma) so everything is in order !
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
All good, closing.
In
gradients.cpp
, I am not sure whether the negative gradient computation of the AlphaOneGradient is correct.For the positive gradient, we have:
Deriving w.r.t yi, we get :
This matches your computation of the gradient and the subsequent
multModify
operation.For the negative gradient, we have:
Deriving w.r.t yi, we get :
This matches your computation of the gradient and the subsequent
multModify
operation only if we omit the factor 2.