Closed dkobak closed 4 years ago
Why does it make sense for the momentum to switch at the same time that the early exaggeration turns off? Is there a reason why they should be linked? That is, was the original reason for making the original momentum different from the momentum during the rest of the iterations because of early exaggeration?
I guess I never thought about why the momentum switch was implemented...it was in the original BH implementation, so I didn't mess with it. Do you have insight into this?
I checked the original 2008 paper and it used early exaggeration of 4 for 50 iterations and momentum 0.5 for 250 iterations and 0.8 afterwards. So there actually mom_switch_iter != stop_early_exag_iter
. The BH paper increased the length of the early exaggeration to 250 but kept the momentum schedule as it was.
I guess this indicates that mom_switch_iter
does not have to be tied to stop_early_exag_iter
. And in the absence of further research on this specific point I guess it is sensible not to change the default behaviour.
OK so let's do nothing. I'm closing this issue.
By default, early exaggeration lasts 250 iterations, and the momentum value is switched after 250 iterations too. However, if one increases the length of the early exaggeration, momentum will still switch at 250 iterations, unless it's explicitly set to other value. This led to a long-lasting confusion that @pavlin-policar and me had over here https://github.com/pavlin-policar/openTSNE/issues/106.
I am wondering if we should change the wrappers to set
mom_switch_iter
to be equal tostop_early_exag_iter
by default (unless explicitly provided).Nobody will ever notice this :-) but it might be a little more consistent, or what do you think?