Closed kellertuer closed 1 year ago
Merging #259 (42a37c9) into master (24a653e) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #259 +/- ##
=======================================
Coverage 99.70% 99.70%
=======================================
Files 73 73
Lines 6447 6452 +5
=======================================
+ Hits 6428 6433 +5
Misses 19 19
Impacted Files | Coverage Δ | |
---|---|---|
src/plans/stochastic_gradient_plan.jl | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Given that primary use of stochastic gradient descent is something like machine learning where we want to avoid calculation of the entire cost function, I think get_cost(M, ago, p, i)
is needed. In the non-vector case get_cost(M, ago, p, 1)
could default to get_cost(M, ago, p)
perhaps?
Sounds fair, what would we do with the case i>1
in the non-vector case?
Either an error or just get_cost(M, ago, p)
-- both sound fine. I think it should be consistent with what we do in the vector case when i
is greater than the number of components.
there I do not implement anything and the vector access fails :)
OK, so i>1
should be an error in the non-vector case.
But the I=1
could also directly error since we are not in a vector case?
Julia allows getting first element of a scalar so for convenience allowing it here makes sense. This way stochastic optimization code could always use get_cost
with an index, both in the scalar and the vector case. In particular I think stepsize selection and stopping criteria should have access to the current batch number i
, for example stochastic gradient norm stopping criterion could refer to gradient for the current i
instead of the overall gradient.
That sounds reasonable. Then I will do the new cost as you said.
I first misread even #249 - but now this resolves #249, it was only necessary to implement
get-cost
for this case and extend the type in the objective (one could even simplify the existing ones).Do we need more functions here, e.g. a
get_cost(M, ago, p, i)
? It would only work in this vector case, though (as opposed for the gradient where it works in both cases, since both return a vector).