Bayesian optimisation of functions on graphs

2nazero commented 1 month ago

@article{wan2023bayesian,
  title={Bayesian optimisation of functions on graphs},
  author={Wan, Xingchen and Osselin, Pierre and Kenlay, Henry and Ru, Binxin and Osborne, Michael A and Dong, Xiaowen},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={43012--43040},
  year={2023}
}

2nazero commented 1 month ago

A Very Short Summary

To sum up this paper in one paragraph, this paper is trying to tell "the need for efficient optimization techniques on graph-structured data." Existing methods either ignore function information or are impractical for large-scale graphs. The authors propose a Bayesian optimization framework that overcomes these limitations.

Important Framework

This figure is the main illustration of what this paper is trying to propose. Here is the explanation!

(a) Start with a Local Subgraph

The algorithm focuses on a small part of the graph (local subgraph) at iteration $$t$$.
The best node found so far is $$v^*_t$$ (black node).
Orange-red nodes represent the local subgraph, with darker colors indicating nodes closer to $$v^*_t$$.
Grey nodes are outside the current local subgraph and are ignored for now.

(b) Decide the Next Node to Query

The algorithm uses a Gaussian Process (GP) surrogate and an acquisition function to predict which node to check next.
Blue nodes represent the predicted values, with darker blue meaning a higher acquisition function value.
The best blue node is selected for the next query, called $$v_{t+1}$$.

(c) Update and Move to a New Subgraph

If the new node $$v{t+1}$$ is better than the current best node, it becomes the new best node $$v^*{t+1}$$.
The focus shifts to a new local subgraph centered around $$v^*_{t+1}$$, and the process repeats until convergence or a set number of evaluations.

2nazero commented 1 month ago

Preliminaries

BO uses a statistical surrogate model* to approximate the objective function and an acquisition function $$α(x)$$ to balance exploitation and exploration under the principle of optimism in the face of uncertainty.

Surrogate model*

Explanation of Surrogate Model

The optimization task is to find the node(s) $$v^*$$ that minimizes the objective function:

$$ v^* = \arg \min_{v \in V} f(v) $$

(@kyungheee) I'm kind of confused here. So does "minimizing" objective function literally means finding the most smallest function value of the objective function?

Bayesian Optimisation on Graph

There are several challenges of applying BO to Graphs which are:

Exotic search space
Scalability
Imperfect knowledge on the graph structure

Algorithm of BayesOptG -1

Step 1: Inputs and initialization.
Step 3-6: Initializing the GP model and random sampling if needed (restart flag).
Step 8-9: Construct the local subgraph, fit the GP model, and optimize its parameters.
Step 10: Select the next query point using the acquisition function.
Step 11: Evaluate the objective function at the query point, update the model and flag.
Step 13: Return the node with the best result after all iterations.

Algorithm of BayesOptG -2 (Step 8)

Tractable Optimisation via Local Modelling

Solutions to the two problems: 1) Large Graphs 2) Incomplete Graph Information

Proposed Solution: 1) Local Subgraph Focus Instead of working with the entire graph (which is computationally expensive), they focus on a small subset of nodes near the best node found so far. 2) Step-by-step Process

At each iteration ( t ), they find the best node $$v^*_t$$ so far (the node where the objective function has the smallest value).
- Using Algorithm 2, they create a local subgraph $$\tilde{G}_t$$ around this best node, including $$Q$$ nodes (this subset is like an ego-network, meaning it focuses on nodes closest to $$v^*_t$$).
- This reduces the computational cost by focusing only on a small part of the graph.

2nazero commented 1 month ago

Relation to Trust-region BO Method (The Key Difference)

Custom Distance Metric:
- Unlike traditional trust region methods, BayesOptG uses a bespoke distance metric designed specifically for the graph space. This custom metric accounts for the unique structure of graphs and how nodes are connected.
Handling Imperfect Graph Knowledge:
- In traditional trust region BO, the main goal is to avoid over-exploration in high-dimensional spaces. However, in BayesOptG, local subgraphs are crucial for dealing with incomplete knowledge of the graph structure. Instead of needing the entire graph upfront, BayesOptG only needs to reveal the subgraph's topology at each iteration, making it efficient for handling partial information.
Improved Scalability:
- Using trust regions also improves the scalability of BayesOptG. By focusing on small, local subgraphs instead of the entire graph, computational costs are reduced, resulting in a massive speed-up, as illustrated in Fig. 3.

2nazero commented 1 month ago

Experiments and the Results

kyungheee commented 1 month ago

@2nazero

It seems like your question is asking whether an optimal minimum always exist. In other words, you seem to be considering the possibility that it may or may not converge to a specific value

Bayesian optimization typically assumes that the objective function is bounded in the region where the optimization is conducted. because the Gaussian process assumes that the value of a function will follow a normal distribution for a given input, however If the function diverges, the Gaussian process would fail to model it properly, rendering the optimization meaningless.

Unbounded Bayesian Optimization via Regularization

Additionally, I found a paper that I haven't read yet but that seems relevant. Let's read together

AIML-K / GNN_Survey