adam优化器计算一阶二阶矩累计和为啥直接用欧式的加法

alibaba / Curvature-Learning-Framework

Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

Apache License 2.0

158 stars 31 forks source link

adam优化器计算一阶二阶矩累计和为啥直接用欧式的加法 #2

Closed talorwu closed 2 years ago

talorwu commented 2 years ago

这里的为啥直接加法了，我理解应该是用指数映射做加法

talorwu commented 2 years ago

以及乘法是不是也应该用曲率空间的scalar multiplication

XuZhirong commented 2 years ago

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法，其保证了收敛性。当然也可以尝试exp map，或许表现更优，欢迎开发~

talorwu commented 2 years ago

这里是实现了Riemannian Adaptive Optimization Methods中Figure.1 提到的算法，其保证了收敛性。当然也可以尝试exp map，或许表现更优，欢迎开发~

似乎实现的不太对，你代码中m和\tao没做区分

talorwu commented 2 years ago

还有一个问题，根据论文假设，二阶矩每个分量要单独计算，不能一起算

longo11070001 commented 2 years ago

似乎实现的不太对，你代码中m和\tao没做区分

Actually, the Identity function is the simplest isometry. In practice, we notice that the identity function could provide acceptable accuracy, which also brings good training efficiency. We agree that complicated isometry may lead to higher performance on specific tasks. Hope that further investigation can be conducted upon this base implementation :) Thanks for pointing out this issue.

longo11070001 commented 2 years ago

还有一个问题，根据论文假设，二阶矩每个分量要单独计算，不能一起算

Please refer to the last paragraph on page 5 of the paper, which claims that under certain conditions (e.g, the simplest condition) the secondary moment can be computed together.

longo11070001 commented 2 years ago

Problem solved.

XuZhirong commented 2 years ago

Thanks for pointing out the possible issues. For efficiency, we implemented \varphi as the identity function and optimize each submanifold equally. We will fix it in later version to keep track of the paper.

alibaba / Curvature-Learning-Framework

adam优化器 计算一阶二阶矩累计和为啥直接用欧式的加法 #2

adam优化器计算一阶二阶矩累计和为啥直接用欧式的加法 #2