Closed kachayev closed 1 year ago
G should not contain negative value by construction (it is a scaled gaussian kernel when using sinkhorn) so this should not occur. Could you have a look at what is happening? Maybe we get steps in the FW that are larger that 1?
Yeah, that's why I was confused with this. I will try to find why exactly this happens.
It turns out that scalar_search_armijo
suggests negative alpha
in this callback. Which leads to initial function
f(xk0 + alpha10 * pk0)
to get value outside of the interpolation range (in our case, this "outside" happens to have negative values as it would be expected because matrix G
has a lot of items pushed to almost 0).
This is somewhat unexpected behavior from scalar_search_armijo
as it has default amin=0
parameter. Though, in fact, there's no check for the boundary here - it computes the minimizer of a quadratic interpolant, executes callback phi
, and returns if condition is met without checking for amin
(the check happens only later in the code).
One possible solution to this would be to check if alpha1
suggested by the search is outside of the min/max range on our side (e.g. by returning inf
value from the phi
callback). But I'm not sure that this is what supposed to happen in the algorithm in the first place: original line_search_armijo
works with the projection onto search direction given by the gradient (i.e. derphi0 = np.dot(gfk, pk)
. In POT we have derphi0 = np.sum(pk * gfk)
, and I'm not sure that this value should guarantee that quadratic interpolation doesn't break the min/max condition. WDYT?
Ouch this one is subtle. Since scalar_search_armijo
does not work as expected (does not respect amin) i would add a quich threshold outsite of its call to force a positive or 0 step.
Are you saying that in the function sometime the gradient is not an ascent direction? This is possible since at each iteration we solve a sinkhorn that might not have converged, in this case setting a step of 0 (because numerical precision prevents us to get descent direction) seems reasonable.
I can surely add additional check into the callback function. I still don't have an intuitive feeling for this to be the right solution, though. I will run more tests.
Are you saying that in the function sometime the gradient is not an ascent direction?
So, this is from the debugger. I'm just evaluating phi
(this one here) over a linspace between 0. and 0.99. In the first case, you can see that the cost goes down. While on the second loop it clearly goes up. Not sure why, Sinkhorn does converge in this case.
np.array([phi(x) for x in np.linspace(0., .99, 20)])
array([ 2.7194557 , 2.29096204, 1.87171902, 1.46029814, 1.0558716 ,
0.65791566, 0.26609441, -0.11979583, -0.4998549 , -0.87409216,
-1.24242958, -1.60469608, -1.96061174, -2.30975788, -2.65152358,
-2.98500709, -3.30881903, -3.62062802, -3.91580393, -4.17963043])
np.array([phi(x) for x in np.linspace(0., .99, 20)])
array([-4.17963043, -4.16778306, -4.14616826, -4.11810441, -4.08496656,
-4.04752903, -4.00628706, -3.96157971, -3.91364772, -3.86266445,
-3.80875372, -3.75200051, -3.69245742, -3.63014829, -3.56506962,
-3.49718998, -3.42644711, -3.35274138, -3.27592209, -3.19574438])
Describe the bug
ot.optim.generic_conditional_gradient
uses the following cost function for line search (whenreg2
is provided):When matrix
G
contains negative values,nx.log(G)
returnsnan
with aRuntimeWarning
.Code sample
This example is used for
test_sinkhorn_l1l2_transport_class
in test/test_da.py.Expected behavior
The question now is negative values are expected to happen while performing line search? If yes, should we evaluate the cost to
-inf
in this case? Or is it okay for the cost to benan
(if this is the case, we should short-circuit the call to avoid warnings).Environment (please complete the following information):