Open fabridamicelli opened 3 years ago
We could put a big integer for non indexed datasets. n
is mostly used to know where on the training loop we are, to do scheduling, etc...
Yes, that's the workaround I found too. It seems that it is only used for progressbar and stuff like that.
But:
1) That breaks compatibility with PyTorch Dataset/Dataloader
2) The documentation says (https://docs.fast.ai/data.load.html#DataLoader):
n (int): Defaults to len(dataset). If you are using iterable-style dataset, you can specify the size with n.
which for me sounds like an optional parameter - but it's not
if you pass n
at constructor it works
Yes, that's exactly what I reported under "Additional context" already. But I don't see the point in having a compulsory argument that breaks the compatibility with pytorch and does in essence nothing useful.
sorry did not get that, I get it now, this extra argument is not present on pytorch Dataloader class.
I took a look at this again and found a workaround.
Basically, we need to remove the lines where the callbacks or the learner look up the len(dataset)
or n
(see the lines commented out in the code snippet below)
Here's the code that fixes it:
from fastcore.all import *
@patch
def after_batch(self: TrainEvalCallback):
# self.n_iter = len(self.dl)
# self.learn.pct_train += 1./(self.n_iter*self.n_epoch)
self.learn.train_iter += 1
@patch
def all_batches(self: Learner):
# self.n_iter = len(self.dl)
for o in enumerate(self.dl): self.one_batch(*o)
learn.remove_cb(ProgressCallback) # equivalently, with learn.no_bar():
learn.fit_one_cycle(3, lr_max=0.1)
learn.recorder.plot_loss()
@tcapelle do you think there is any interest to pursue a fix in this direction? I'd be happy to help. (Some more thought would be necessary to handle the progress bar, but that's a bit of a different story)
Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES
Describe the bug DataLoader does not work with IterableDataset if argument
n
is not given. If the length of the data is not known in advance (ie we don't known
) and leaven=None
, the dataloader will throw aTypeError
. So that goes against one of the purposes of an IterableDataset and it breaks compatibility with PyTorch.Steps to reproduce the behavior:
Expected behavior Dataloaders should work with IterableDatasets, even without passing argument
n
, just as PyTorch Dataloaders do.Error with full stack trace
Additional context Setting
n
in DataLoader to an arbitrary positive number avoids the Error.fastai version: 2.4 pytorch version: 1.9.0