Overall loop for training/testing - proposal

Follow up on issue #3 (overall loop for training neural nets)

Current implementation of Net module: https://github.com/choderalab/pinot/blob/master/pinot/net.py

Current training/testing loop implementation: https://github.com/choderalab/pinot/blob/master/pinot/app/experiment.py

Net

The class Net could be slightly modified from the current implementation to accommodate for unsupervised training. One easy way is to add a function loss_unsupervised to the net class. Where this function invokes representation.loss on input graph g. So during unsupervised training, loss is computed using loss_unsupervised and during supervised training, loss is computed using the existing loss

class Net:
    def __init__(self, representation, parameterization)

    def forward(self, g):
        h = self.representation(h)
        theta = self.parameterization(h)
        return theta

    def condition(self, g):
        theta = self.forward(g)
        distribution = self.distribution_class(
            *self.param_transform(*torch.unbind(theta, dim=-1))
        )
        return distribution
    
    def loss(self, g, y):
        dis = self.condition(theta)
        return -dis.log_prob(y)

    def loss_unsupervised(self, g): # one more function to add to the current implementation
        return self.representation.loss(g)

Representation (interface) To accommodate the above interface, any representation needs to have an apply function that takes in an input graph and returns a latent representation h. In addition, it needs to have a loss function that takes in only an input graph and returns a loss. The loss will probably differ according to the representation. For example, VAE will have -ELBO as the loss, a vanilla AE might have reconstruction error as the loss.

interface Representation:
    def apply(self, g):
        h = a latent representation of graph g
        return h

    def loss(self, g):
        l = compute "unsupervised loss" such as ELBO in the case of VAE
        return l

Parameterization (interface) To accommodate for the interface, any parameterization needs to have an apply function that takes in a latent representation and returns the parameters of the predictive distribution.

interface Parameterization:
    def apply(self, h):
        theta = parameters of the approximate predictive distribution
        return theta

Unsupervised training I propose creating new classes to do unsupervised training/testing. There might be very little difference between the implementation of TrainUnsupervised and Train (supervised). However, I still propose implementing new classes because we will potentially need to do hyperparameter optimization of both the supervised and unsupervised steps separately. Therefore, having a separate (albeit similar) implementations will make it easier down the road.

class TrainUnsupervised:
    def __init__(self, unsup_data, net, optimizer, n_epochs=100)

    def train_once(self):
        for g in unsup_data: # or (g, _) in sup_data if we want to "reuse" the same data for unsupervised training
            def l():
                self.optimizer.zero_grad()
                # Main difference from Train, here we use loss_unsupervised 
                loss = net.loss_unsupervised(x).sum() 
                loss.backward()
                return loss
            self.optimizer.step(l)

    def train(self):
        for idx in range(n_epochs):
            self.train_once()

An example of a training loop

This could be an example training loop where we have separate unsupervised data and supervised data. We first train on unsupervised data to learn a "decent" representation and then do supervised training.

unsupervised_data = load_unsupervised_data(...) # This is just an example
supervised_data   = load_supervised_data(...)   # there can be 1 single function that does data loading

vae_representation = VAE() # Just an example, we use VAE to learn a good representation
unsup_optimizer = Adam()

mlp_parameterization = MLP() # Just an example, we use MLP to learn a parameterization of predictive distribution
sup_optimizer = SGD()

net = Net(vae_representation, mlp_parameterization)

unsupervised_training = TrainUnsupervised(unsupervised_data, net, unsup_optimizer)
net = unsupervised_training.train() # After this training, we will get a learned representation
# We can also do test if we have held out data (one metric is NLL on held out data?)
unsupervised_testing = TestUnsupervised(heldout_unsup_data, net) # This could track NLL per iteration on held out set

# Then we can do supservised training/testing the same way as our current Train/Test implementation
supervised_training = Train(supervised_data, net, sup_optimizer)
supervised_training.train()

choderalab / pinot

Overall loop for training/testing - proposal #31