Allow the stochastic cost to be a vector of costs.

JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia

http://manoptjl.org

Other

321 stars 40 forks source link

Allow the stochastic cost to be a vector of costs. #259

Closed kellertuer closed 1 year ago

kellertuer commented 1 year ago

I first misread even #249 - but now this resolves #249, it was only necessary to implement get-cost for this case and extend the type in the objective (one could even simplify the existing ones).

Do we need more functions here, e.g. a get_cost(M, ago, p, i)? It would only work in this vector case, though (as opposed for the gradient where it works in both cases, since both return a vector).

codecov[bot] commented 1 year ago

Codecov Report

Merging #259 (42a37c9) into master (24a653e) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #259   +/-   ##
=======================================
  Coverage   99.70%   99.70%           
=======================================
  Files          73       73           
  Lines        6447     6452    +5     
=======================================
+ Hits         6428     6433    +5     
  Misses         19       19

Impacted Files	Coverage Δ
src/plans/stochastic_gradient_plan.jl	`100.00% <100.00%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

mateuszbaran commented 1 year ago

Given that primary use of stochastic gradient descent is something like machine learning where we want to avoid calculation of the entire cost function, I think get_cost(M, ago, p, i) is needed. In the non-vector case get_cost(M, ago, p, 1) could default to get_cost(M, ago, p) perhaps?

kellertuer commented 1 year ago

Sounds fair, what would we do with the case i>1 in the non-vector case?

mateuszbaran commented 1 year ago

Either an error or just get_cost(M, ago, p) -- both sound fine. I think it should be consistent with what we do in the vector case when i is greater than the number of components.

kellertuer commented 1 year ago

there I do not implement anything and the vector access fails :)

mateuszbaran commented 1 year ago

OK, so i>1 should be an error in the non-vector case.

kellertuer commented 1 year ago

But the I=1 could also directly error since we are not in a vector case?

mateuszbaran commented 1 year ago

Julia allows getting first element of a scalar so for convenience allowing it here makes sense. This way stochastic optimization code could always use get_cost with an index, both in the scalar and the vector case. In particular I think stepsize selection and stopping criteria should have access to the current batch number i, for example stochastic gradient norm stopping criterion could refer to gradient for the current i instead of the overall gradient.

kellertuer commented 1 year ago

That sounds reasonable. Then I will do the new cost as you said.