Stalence / erdos_neu

Official Repo for the NeurIPS2020 paper "Erdos Goes Neural: An Unsupervised Learning Framework for Combinatorial Optimization on Graphs"
43 stars 11 forks source link

Clique loss function in the paper and the code #1

Closed WilliamKRobert closed 3 years ago

WilliamKRobert commented 3 years ago

I have difficulty making connection between the loss function of the paper and the code. In Corollary 1 of Section 4.1 of your paper, the loss function of maximum clique problem is controlled by gamma and beta. Could you point out the corresponding variables in your code? Does beta corresponds to penalty_coefficient?

Stalence commented 3 years ago

That's right. If you rewrite the loss w.r.t beta, you will have a difference of two sums (this is the "expected distance") in the code. So indeed, the penalty coefficient corresponds to beta. Gamma is just an offset so it generally doesn't matter for the optimization (if you find that it does, please let me know!).

WilliamKRobert commented 3 years ago

I see! Thank you for your quick response!

What still confuses me is the value of beta. In the paper, beta is larger or equal to the clique number while the default beta value is 0.25 in the code. How should I adjust the value of beta on my own dataset? Is it ok to take a value slightly larger than the clique number anyway (I am not exactly sure how you come up with beta=0.25)?

Stalence commented 3 years ago

The selection for the default beta in that piece of code is arbitrary. I just set a default number that is likely not going to break anything in the training when you run it. In practice, it's good to decide it based on a validation set (we point this out in the paper). The range I would recommend searching in is [1, guess_for_clique_number]. Based on my experience, values below 10 were generally more stable but it really depends on the instances you're working with and the kind of problem you're solving. In the "erdos_tud" notebook you can see what penalty coefficient I use for the real world datasets to get results like those reported in the paper.

In the max-clique problem for example, if you use a very large number for the penalty coefficient you could end up being stuck with your network only selecting 2 nodes that are connected by an edge because those are the ones that trivially satisfy the constraint. So you need to be mindful of the shortcuts that your network may take (and it will take them if they are available) when selecting a value for the penalty. I hope this helps!

WilliamKRobert commented 3 years ago

Now it makes more sense. Thank you for the detailed explanation!