Gradient computation detail

elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

340 stars 63 forks source link

Gradient computation detail #24

Closed tdeboissiere closed 7 years ago

tdeboissiere commented 7 years ago

In gradients.cpp, I am not sure whether the negative gradient computation of the AlphaOneGradient is correct.

For the positive gradient, we have:

L ~ log(1 / (1 + (yi - yj)**2).

Deriving w.r.t yi, we get :

 - 2 (yi - yj) / (1 + (yi - yj)**2

This matches your computation of the gradient and the subsequent multModify operation.

For the negative gradient, we have:

L ~ gamma * log(  (yi - yj)**2 / (1 + (yi - yj)**2).

Deriving w.r.t yi, we get :

2 gamma * 1 / [ (yi - yj) * (1 + (yi - yj)**2 )

This matches your computation of the gradient and the subsequent multModify operation only if we omit the factor 2.

elbamos commented 7 years ago

Boy I wish I'd heard from you when I was struggling to get the gradient right!

I'm not sure I completely follow. Are you saying that you think I have an extra factor of 2 in the gradient? Or are you saying I'm missing a factor of 2? Are you saying that you come up with different code than I do, or are you saying that you tested your code and my code and they produce different answers?

If its the former, take a look at the file largeVis.h, and the initialization of alphagamma around line 193: https://github.com/elbamos/largeVis/blob/master/src/largeVis.h#L193

class AlphaGradient: public Gradient {
  const coordinatetype alpha;
  const coordinatetype twoalpha;
protected:
  const coordinatetype alphagamma;
  virtual void _positiveGradient(const double dist_squared,
                                 coordinatetype* holder) const;
  virtual void _negativeGradient(const double dist_squared,
                                 coordinatetype* holder) const;
public:
  AlphaGradient(const distancetype a,
                const distancetype g,
                const dimidxtype D) : Gradient(g, D),
                               alpha{a},
                               twoalpha(alpha * -2),
                               alphagamma(alpha * gamma * 2) { } ; // <--- HERE
};

class AlphaOneGradient: public AlphaGradient {

Does that clear it up?

tdeboissiere commented 7 years ago

Yes it is clear now (the factor two I mentioned is included in alphagamma) so everything is in order !

However, I think it would be clearer if you defined alphagamma as alpha * gamma and move the 2 to the gradient code. (Unless you had another reason to do so).

elbamos commented 7 years ago

Whew! You had me worried for a minute!

Actually for the last two hours I've been running plots with different settings (1, 2, 4) of that constant for mnist and it seems to actually matter very little. I suppose the effect is very similar to what happens by varying M, controlling the total relative weight of the negative and positive samples in the total gradient.

On Sep 1, 2016, at 1:47 AM, Thibault de Boissiere notifications@github.com wrote:

Yes it is clear now (the factor two I mentioned is included in alphagamma) so everything is in order !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tdeboissiere commented 7 years ago

All good, closing.