davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch
MIT License
921 stars 120 forks source link

Make `ValDataLoaderIter` able to be reset only when it is acquired by the syntax of normal `iterator` #60

Closed NaleRaphael closed 4 years ago

NaleRaphael commented 4 years ago

ValDataLoaderIter is not reset after each iteration, and it results in the problem shown in issue #59. See also this comment: https://github.com/davidtvs/pytorch-lr-finder/issues/59#issuecomment-668631184 for further details.

In this PR, tests related to TrainDataLoaderIter and ValDataLoaderIter are also added.


UPDATE:

Purpose of the changes in this PR is to make ValDataLoader work like torch.data.DataLoader. When we want to iterate over a torch.data.DataLoader, we can use the following approaches:

  1. iterate it through a for-loop (DataLoader.__iter__() is called implicitly by for loop)

    data_loader = DataLoader(dataset)
    for batch in data_loader:
        ...
    
    # This is OK since an iterator will be create each time `data_loader`
    # is used in a for-loop.
    for batch in data_loader:
        ...
  2. iterate it through an iterator created by iter()

    loader_iter = iter(data_loader)
    for i in len(data_loader):
        batch = next(loader_iter)
        ...
    
    # `loader_iter` should runs out of values now, so that the following
    # operation will trigger the exception `StopIteration`.
    next(loader_iter)

And the first approach is how we use val_loader before commit 52c189a. An iterator of val_loader will be created whenever it is iterated through the for-loop in self._validate().

https://github.com/davidtvs/pytorch-lr-finder/blob/ed2182523bb3175bbf61dcf7b354d976fcb0ee65/torch_lr_finder/lr_finder.py#L197-L200

In commit 52c189a, ValDataLoaderIter was introduced to solve the problem of unpacking multiple values returned by a data loader. However, since an instance of ValDataLoaderIter is created, the iterator we can acquire by either a for-loop or iter() is always the same object (i.e. ValDataLoaderIter() itself). That means it can't be reset by those approaches mentioned above, and that's why it makes self._validate() failed to work as before.

With the changes made in this PR, ValDataLoaderIter should work like how we use torch.data.DataLoader.

Besides, the reason why we didn't just apply these changes to DataLoaderIter is that TrainDataLoaderIter relies on it's behavior. Since TrainDataLoaderIter should be capable of resetting itself when auto_reset is True, it should return itself when we acquire an iterator through TrainDataLoaderIter.__iter__().

NaleRaphael commented 4 years ago

It seems that "ci-build / ci-build (3.7, 0.4.1)" stuck at step "Upload coverage to Codecov" 😲

NaleRaphael commented 4 years ago

Comparing with successful jobs before, there is only three build results uploaded to codecov.

Expected result (4 coverage files, committed at Aug, 8): https://codecov.io/gh/davidtvs/pytorch-lr-finder/commit/fd6f0126086caf0f9dfe766b1efee65ff5cbc000/build

Got result (3 coverage files, committed at today): https://codecov.io/gh/davidtvs/pytorch-lr-finder/commit/af57ea7ae04220dd01b13fb429e484a19df00091/build

It seems that we cannot re-run this job through UI of GitHub Action currently, see also this post. But I found this post stated that we can push a empty commit to trigger it. If it's necessary to get those coverage reports from all 4 builds, feel free to let me know to do this.

davidtvs commented 4 years ago

I reran the jobs and now they ran normally.

I'm merging this. Thanks!

NaleRaphael commented 4 years ago

Thanks 👍