Stalence / erdos_neu

Official Repo for the NeurIPS2020 paper "Erdos Goes Neural: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs"
43 stars 12 forks source link

Question how the loss function is constructed #7

Open NMO13 opened 3 months ago

NMO13 commented 3 months ago

Hi! I read you great paper and I have a question regarding you loss function. I hope its OK to ask questions about it here since the question is not really implementation specific.

I don't get it how you construct the loss function: image

So f (S; G) is the objective that we want to optimize, like max cut or max clique. But how did you come up with this probabilistic formulation? I think this is a lack of knowledge of math/stats on my side. If you can give an explanation or point me to some related literature, that would be awesome.

Stalence commented 3 months ago

To see how this argument comes about, you can start from Markov's inequality: $P(f(S;G)>a) \leq \frac{E[f(S;G)]}{a}$. Then the idea is to use the value of the loss function as a way to extract a certificate of solution quality. In that way, when the loss is low you will be certifying that with controlled probability your distribution that is parametrized by the neural network contains good quality solutions.

$P(f(S;G)>l(\mathcal{D};G)) \leq \frac{E[f(S;G)]}{l(\mathcal{D};G)}$.

So then you can ask which expression that involves the loss can be plugged inside Markov's inequality to yield that kind of statement. If you do $l(\mathcal{D};G) \triangleq \frac{E[f(S;G)]}{1-t}$, you can see that you arrive at (2) from your screenshot.

Then as we explain, if you train the network and you obtain $l(\mathcal{D};G) = \epsilon$, you have a certificate that $P(f(S;G) < \epsilon) > t$, so with controlled probability, for a minimization problem you'll have a solution of low cost.

Hope this helps.

NMO13 commented 3 months ago

Thanks for the answer. How is t chosen? I suppose a high value for t is better because it guarantees better solution quality right? So shouldn't t be always chosen to be high?

UPDATE: I read the paper again. I think t is not chosen at all and is even not quantifiable. But it is a bound that increases as the loss decreases. Is that correct?