[Feature request] Recurrent GatingNet

While the dataset we constructed is collected in an inherently sequential manner (it is a rollout of the SCIP default branching scheme), we randomly shuffle all the data and train as if all the datapoints are independent and identically distributed (iid). The motivating question is how accurate is this iid assumption?

There are two sides of the argument: (1) Pro-iid camp: The hand-crafted features are themselves temporal and we are effectively in a Markovian setting.

(2) Anti-iid camp: While the features are themselves temporal, they are not temporal enough. There's still lots of temporal information we are failing to capture, which ultimately hinders generalization performance.

Currently our GatingNet is a feedforward multi-layer perceptron (MLP). The feature request is to modify the GatingNet to be a recurrent neural network. Instead of breaking sequentiality of our collected data, we preserve sequentiality and train in episodes or collected SCIP rollouts. This will require coding up new data-loaders and padding the sequences to allow for batch-wise training of episodes. Backpropagation through time (BPTT) may also need to be employed as for some of the more difficult instances, the rollouts can easily get into the thousand time-step range.

We measure if using a recurrent GatingNet improves generalization performance, both the (tangential but "standard" ML loss and accuracy metrics) and the MILP solver evaluation metrics.

ds4dm / branch-search-trees

[Feature request] Recurrent GatingNet #3