fastai / fastai

The fastai deep learning library
http://docs.fast.ai
Apache License 2.0
26.19k stars 7.55k forks source link

DataLoader does not work with IterableDataset if argument `n` is not given. #3413

Open fabridamicelli opened 3 years ago

fabridamicelli commented 3 years ago

Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES

Describe the bug DataLoader does not work with IterableDataset if argument n is not given. If the length of the data is not known in advance (ie we don't know n) and leave n=None, the dataloader will throw a TypeError. So that goes against one of the purposes of an IterableDataset and it breaks compatibility with PyTorch.

Steps to reproduce the behavior:

from torch import tensor

from torch.utils.data import IterableDataset
from torch.nn import Module, Linear
from torch.nn.functional import mse_loss
from fastai.data.all import DataLoaders, DataLoader
from fastai.learner import Learner

x, y = range(11), range(11)

class DummyIterableDataset(IterableDataset):
    def __iter__(self):
        for inp, tar in zip(x, y):
            yield tensor([inp,inp]).float(), tensor(-tar).float()

class Model(Module):
    def __init__(self):
        super().__init__()
        self.model = Linear(in_features=2, out_features=1)
    def forward(self, x):
        return self.model(x)

dl_pytorch = DataLoader(DummyIterableDataset(), batch_size=3, drop_last=True, indexed=False)
dls = DataLoaders(dl_pytorch, dl_pytorch,)

learn = Learner(dls, Model(), loss_func=mse_loss)
learn.fit(1)

Expected behavior Dataloaders should work with IterableDatasets, even without passing argument n, just as PyTorch Dataloaders do.

Error with full stack trace

TypeError                                 Traceback (most recent call last)
<ipython-input-81-49ea336f19e8> in <module>
     26 
     27 learn = Learner(dls, Model(), loss_func=mse_loss)
---> 28 learn.fit(1)

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    219             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    220             self.n_epoch = n_epoch
--> 221             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    222 
    223     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    161 
    162     def _with_events(self, f, event_type, ex, final=noop):
--> 163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
    165         self(f'after_{event_type}');  final()

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _do_fit(self)
    210         for epoch in range(self.n_epoch):
    211             self.epoch=epoch
--> 212             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    213 
    214     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    161 
    162     def _with_events(self, f, event_type, ex, final=noop):
--> 163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
    165         self(f'after_{event_type}');  final()

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _do_epoch(self)
    204 
    205     def _do_epoch(self):
--> 206         self._do_epoch_train()
    207         self._do_epoch_validate()
    208 

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _do_epoch_train(self)
    196     def _do_epoch_train(self):
    197         self.dl = self.dls.train
--> 198         self._with_events(self.all_batches, 'train', CancelTrainException)
    199 
    200     def _do_epoch_validate(self, ds_idx=1, dl=None):

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    161 
    162     def _with_events(self, f, event_type, ex, final=noop):
--> 163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
    165         self(f'after_{event_type}');  final()

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/learner.py in all_batches(self)
    166 
    167     def all_batches(self):
--> 168         self.n_iter = len(self.dl)
    169         for o in enumerate(self.dl): self.one_batch(*o)
    170 

~/miniconda3/envs/neuralnets/lib/python3.9/site-packages/fastai/data/load.py in __len__(self)
     90 
     91     def __len__(self):
---> 92         if self.n is None: raise TypeError
     93         if self.bs is None: return self.n
     94         return self.n//self.bs + (0 if self.drop_last or self.n%self.bs==0 else 1)

TypeError: 

Additional context Setting n in DataLoader to an arbitrary positive number avoids the Error.

fastai version: 2.4 pytorch version: 1.9.0

tcapelle commented 3 years ago

We could put a big integer for non indexed datasets. n is mostly used to know where on the training loop we are, to do scheduling, etc...

fabridamicelli commented 3 years ago

Yes, that's the workaround I found too. It seems that it is only used for progressbar and stuff like that. But: 1) That breaks compatibility with PyTorch Dataset/Dataloader 2) The documentation says (https://docs.fast.ai/data.load.html#DataLoader): n (int): Defaults to len(dataset). If you are using iterable-style dataset, you can specify the size with n.

which for me sounds like an optional parameter - but it's not

tcapelle commented 3 years ago

if you pass n at constructor it works

fabridamicelli commented 3 years ago

Yes, that's exactly what I reported under "Additional context" already. But I don't see the point in having a compulsory argument that breaks the compatibility with pytorch and does in essence nothing useful.

tcapelle commented 3 years ago

sorry did not get that, I get it now, this extra argument is not present on pytorch Dataloader class.

fabridamicelli commented 2 years ago

I took a look at this again and found a workaround. Basically, we need to remove the lines where the callbacks or the learner look up the len(dataset) or n (see the lines commented out in the code snippet below) Here's the code that fixes it:

from fastcore.all import *

@patch
def after_batch(self: TrainEvalCallback):
    # self.n_iter = len(self.dl)
    # self.learn.pct_train += 1./(self.n_iter*self.n_epoch)
    self.learn.train_iter += 1

@patch
def all_batches(self: Learner):
    # self.n_iter = len(self.dl)
    for o in enumerate(self.dl): self.one_batch(*o)

learn.remove_cb(ProgressCallback)  # equivalently, with learn.no_bar():
learn.fit_one_cycle(3, lr_max=0.1)
learn.recorder.plot_loss()

@tcapelle do you think there is any interest to pursue a fix in this direction? I'd be happy to help. (Some more thought would be necessary to handle the progress bar, but that's a bit of a different story)