[Feature] Pytorch - Githubissues

eddiebergman commented 5 months ago

This issue will serve as a log as to the PyTorch progress in AMLTK. Please feel free to chime in with any information/suggestions/solutions to problems.

eddiebergman commented 5 months ago

The first step with PyTorch integration is to make it work with a simple MLP with 1 hidden layer. This works quite trivially if you have a class MyNet that implements it but that's not what the amltk pipelines are for. We'd rather define it as so:

pipeline = Sequential(
    nn.Flatten(start_dim=1),
    Component(nn.Linear, config={"in_features": 724, "out_features": 20}, name="fc1"),
    nn.ReLU,
    Component(nn.Linear, config={"in_features": 20, "out_features": 10}, name="fc2"),
    Component(nn.LogSoftmax, config={"dim": 1}),
    name="my-mlp-pipeline",
)

The first challenge is to somehow define the search space in the pipeline, where that number 20 can go between something like (10, 30). The main issue is:

The input and output features are tied together, i.e. they are the same parameters define in two places, how can we tie these together? Can we use the request functionality to make this work? Typically we've just defined the search space with the component its parameterize.

Problem Script

This script can be used to try solve the problem. ```python # Check the `main()` function to get started and follow it through. # Note that performance is irrelevant for now. # Most of my pytorch stuff as an example is just taken from here. # https://github.com/pytorch/examples/blob/main/mnist/main.py from __future__ import annotations from collections import OrderedDict from typing import TYPE_CHECKING import torch import torch.nn.functional as F # noqa: N812 from torch import nn, optim from torch.optim.lr_scheduler import StepLR from torchvision import datasets, transforms from amltk import Component, Metric, Sequential # Change this to optuna if you prefer # -- from amltk.optimization.optimizers.optuna import OptunaParser from amltk.optimization.optimizers.smac import SMACOptimizer if TYPE_CHECKING: from amltk import Node, Trial # This is a nice import :) from rich import print # NOTE: This is the reference model, slowly try to build up to this # but make it parametrizable. # The goal would be that users don't define this class (maybe?) # but they can define it using the pipeline structure. # We can handle that later, for now, the pipeline definition below should be enough. class Net(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) return F.log_softmax(x, dim=1) # Just taken from the pytorch example def test( model: nn.Module, device: torch.device, test_loader: torch.utils.data.DataLoader, ) -> tuple[float, float]: model.eval() test_loss = 0 correct = 0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += F.nll_loss(output, target, reduction="sum").item() pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) accuracy = 100.0 * correct / len(test_loader.dataset) return float(test_loss), float(accuracy) # NOTE: The idea for this would be to integrate a general enough builder # into AMLTK that can take a pipeline and build a nn.Module out of it. def some_custom_building_function(pipeline: Node) -> nn.Module: # TODO: This somehow has to go from a configured pipeline to a nn.Module # Take a look at the amltk.pipeline.builders.sklearn to see how this is done # for sklearn. # # https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html print(pipeline) print("Should use the configs and names from above, not be hardcoded here") # TODO: The main difficulty here will be to figure out how to build # this correctly given the pipeline configs and the `item` in the pipeline. # This means you should manually place in things like `nn.Flatten`, # they're already defined in the `main()` function below. # Specficially matching input and output dimensions properly, without # knowledge ahead of time what the pipeline should be model = nn.Sequential( OrderedDict( [ ("flatten", nn.Flatten()), ("fc1", nn.Linear(in_features=784, out_features=20)), ("relu1", nn.ReLU()), ("fc2", nn.Linear(in_features=20, out_features=10)), ("sftmax", nn.LogSoftmax(dim=1)), ], ), ) print(model) return model def eval_configuration( trial: Trial, pipeline: Node, device: str = "cpu", # Change if you have a GPU epochs: int = 1, # Fixed for now lr: float = 0.1, # Fixed for now gamma: float = 0.7, # Fixed for now batch_size: int = 64, # Fixed for now log_interval: int = 10, # Fixed for now ) -> Trial.Report: trial.store({"config.json": pipeline.config}) # TODO: I don't know if this is good enough for seeding and if it works across processes # for torch?? At least with sklearn you can pass around a RandomState but torch has no # such thing torch.manual_seed(trial.seed) train_loader = torch.utils.data.DataLoader( datasets.MNIST( "../data", train=True, download=True, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))], ), ), batch_size=batch_size, shuffle=True, ) test_loader = torch.utils.data.DataLoader( datasets.MNIST( "../data", train=False, download=True, transform=transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))], ), ), batch_size=batch_size, shuffle=True, ) _device = torch.device(device) model = ( pipeline .configure(trial.config) .build(builder=some_custom_building_function) # TODO: This part is where difficulty lies .to(_device) ) print(model) with trial.begin(): # I feel like the optimizer and lr_scheduler should somehow also # be part of the pipeline that's gotten when calling build optimizer = optim.Adadelta(model.parameters(), lr=lr) lr_scheduler = StepLR(optimizer, step_size=1, gamma=gamma) # Just a defactor torch training loop for epoch in range(epochs): for batch_idx, (data, target) in enumerate(train_loader): optimizer.zero_grad() data, target = data.to(_device), target.to(_device) output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() if batch_idx % log_interval == 0: # Might want to store these things in the summary, see below if batch_idx % log_interval == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item())) lr_scheduler.step() if trial.exception: return trial.fail() final_train_loss, final_train_acc = test(model, _device, train_loader) final_test_loss, final_test_acc = test(model, _device, test_loader) trial.summary["final_test_loss"] = final_test_loss trial.summary["final_test_accuracy"] = final_test_acc trial.summary["final_train_loss"] = final_train_loss trial.summary["final_train_accuracy"] = final_train_acc # TODO: We might also want to be able do this inside the training loop, # during the batch_idx % log_interval == 0 block. # However we would then have to store it as # # trial.summary["epoch_{epoch}:batch_{batch_idx}:loss"] = batch_loss # trial.summary["epoch_{epoch}:batch_{batch_idx}:acc"] = batch_accuracy # # This is not ideal because getting a curve out of this wouldn't work well. # It could be possible to do # # At start, # # trial.summary["blahhhh"] = {"loss": [], "acc": []} # # and then during the loop # # trial.summary["blahhhh"]["loss"].append(batch_loss) # trial.summary["blahhhh"]["acc"].append(batch_acc) # We need a custom PathLoader to now how to store # a .pt file? # trial.store({"model.pt": model.state_dict()}) # Ideally we should have a validation set for doing proper HPO # setup but we'll just use the test accuracy return trial.success(accuracy=final_test_acc) def main() -> None: # Training settings torch.device("cpu") # Download the dataset datasets.MNIST("../data", train=True, download=True) datasets.MNIST("../data", train=False, download=True) # TODO: The goal here will be to somehow setup a search space where # we can search over the this `20` number, lets say from `10` to `30`?? # If you find this impossible to do, please write up how you'd like to express it instead # and we will go from there. pipeline = Sequential( nn.Flatten(start_dim=1), # <- Will be a `Fixed` because it's an instantiated object Component(nn.Linear, config={"in_features": 724, "out_features": 20}, name="fc1"), nn.ReLU, # <- Will be a `Component` because it's a class Component(nn.Linear, config={"in_features": 20, "out_features": 10}, name="fc2"), Component(nn.LogSoftmax, config={"dim": 1}), name="my-mlp-pipeline", ) # NOTE: I don't particularly like that you have to wrap F.relu in a `Fixed`. # * Fixed - Something that is Fixed and doesn't need to be initialized # * Component - Something that needs to be initialized with a config # # The problem is that right now, if we detect a function, we assume it constructs # something to use in a pipeline, not that we should use the function directly. # FYI: The `Metric` class is so you don't have to worry about giving # the correct thing to the optimizer, the Metric class takes care of # normlizing and return a number the optimizer should optimize. # Some optimizers always minimize, some can allow you to choose, some # work better with normalized values, etc. metric = Metric("accuracy", minimize=False, bounds=(0, 1)) optimizer = SMACOptimizer.create( space=pipeline, metrics=metric, seed=1, bucket="pytorch-experiments", ) # We wont use the Scheduler here as it's not needed for making # this example work. We'll just use one trial for now. trial = optimizer.ask() report = eval_configuration(trial, pipeline) print(report) if __name__ == "__main__": main() ```

eddiebergman commented 2 months ago

The basic requirements of the previous features are mostly implemented aside from Join and Split which I will work on soon.

In the meantime, the next steps will be towards taking the ResNet models family from PyTorch and do the following:

Be able to fully define and parametrize a full ResNet3 such that the model fits in a single script, ideally a single object.
Perform Bayesian Optimization on this pipeline, training from scratch for a single epoch on a single node.
Perform Bayesian Optimization on a frozen ResNet18 (linked above) and only tune the hyperparameters of the last 1-2 layers, e.g. Conv + Linear or Linear + Linear.
Test if this works with the dask-jobqueue scheduler when linking to GPU nodes on a SLURM cluseter.
(?) Test if this works in a multi-node/parallelized model variant.

automl / amltk

[Feature] Pytorch #233