[RFC] Code structure refactor

mli commented 4 years ago

Background

Most d2l chapters follows the data/network/loss/training/prediction structure. Codes are reused by saving into the d2l package. However, APIs are not consistent and some functions have a lot of arguments. Some users are using the d2l package in their products after learning the materials. So it's good to have d2l to be a mini deep learning framework, and also improves its readability.

Design Principals

We want object oriented, instead of a bunch of functions. But we should avoid over-designing
The API should be framework agonistic, as we are adding PT and TF implementations
Using Python 3 features, including f-string and typing for better readability

API design

Inspired by pytorch-lightning, we can create an API Trainer class:

class Trainer:
  def __init__(self, net, num_epochs, batch_size, ...):
    self.net = net
    #...
  def train_dataloader(self):
    raise NotImplemented()
  def optimizer(self):
    raise NotImplemented()
  def train_step(self, batch, batch_idx):
    raise NoteImplemented()
  def fit():
    optimizer = self.optimizer()
    for epoch in range(self.num_epochs):
      for batch_idx, batch in enumerate(self.train_dataloader()):
        loss = self.train_step(batch, batch_idx)
        loss.backward()
        optimizer.step()

Now implement a particular algorithm

class FashionMnistTrainer(d2l.Trainer):
  def train_dataloader(self):
    return DataLoader(...)
  def optimizer(self):
    return SGD()
  def train_step(self, batch, batch_idx):
    X, y = batch
    y_hat = self.net(X)
    return loss(y, y_hat)

Plugin a particular network

lenet = ...
trainer = FashionMnistTrainer(lenet, num_epochs=5)
trainer.fit()

Expand the class definition into multiple code cells

One main reason we didn't use class too much is that the implementation of a class is long, we want to break it into multiple code cells in jupyter.

Recently I found a way to do it:

Cell 1:

class MyClass():
  def __init__(self):
    pass

Cell 2:

class MyClass(MyClass):
  def run(self):
    return 1

Cell 3:

a = MyClass()
a.run()

Python 3 features

Using typing

from typing import List

def my_func(a: str, b: int) -> List[str]:
    return [a, str(b)]

Later on, we can use mypy to check the type. It may be too verbose to add typing for every function, but we should enforce it for key APIs

f-string

We rewrite "loss %.3f, name %s".format(loss, name) into f"loss {loss:.3f}, name {name}".

Open Questions

There are tiny difference between MX/PT/TF on data loader, optimizer, .... If we can share an almost identical Trainer definition. Especially for TF 2.0, Keras already provide a high-level fit function. If or not it's necessary
We now maintain two libraries, d2l for mxnet implementation and d2l_pytorch for pytorch. If or not we want to put them into a single library. Such as in mxnet, we save code by "Save into the d2l.mxnet package". To use it, just from d2l import mxnet as d2l

mli commented 4 years ago

@astonzhang @smolix @AnirudhDagar @archersama

astonzhang commented 4 years ago

Thanks. The idea is very good and the workaround of breaking a class def into multiple cells may address our earlier concerns. One thing we need to think carefully is the design of the Trainer base class, such as its fit method: D2L has many different use cases such as sequence to sequence and object detection.

AnirudhDagar commented 4 years ago

As @astonzhang suggested, I have the same concerns with the design of the Trainer base class fit method which needs to handle various scenarios.

Other than that, type hints will surely be a great addition for type checking using mypy and it also adds to the readability of the code, making it more understandable. We should definitely add it to the key APIs. Also fstrings in python3 again help the print statements be more readable and concise, so there is no reason not to go ahead with refactoring these.

The idea of keeping all the framework implementations in a single library looks good, but this will make things a bit more complex in terms of mantainence of the single framework with multiple dependencies. This is a design choice which we should discuss in detail.

Yes, the API redesign, will definitely look good for the mini framework-agnostic library that d2l can become. But, we do not want to over-design things and somehow end up with complex APIs. This may also hinder the understanding of DL concepts along with code for people who are just starting and are very new to the world of Deep Learning.

mli commented 4 years ago

we can have multiple version of Trainer, for example, the basic CPU trainer BasicTrainer, the multi-gpu trainer Trainer, and others. The idea is to reuse codes.

We don't need to have a single trainer to support every framework, but make them as similar as possible. I'm thinking about have all codes in the d2l module, then

from d2l import mxnet as d2l  # for mxnet implementations
from d2l import torch as d2l # for pytorch implementations
from d2l import tensorflow as d2l  # for tensorflow implementations

d2l-ai / d2l-en