davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch
MIT License
912 stars 116 forks source link

Cannot determine `batch_size` from a list of string while running `range_test()` with `val_loader` #57

Closed NaleRaphael closed 4 years ago

NaleRaphael commented 4 years ago

Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers for #55 .

Condition

Error message

---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

1 frames

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    288             if val_loader:
    289                 loss = self._validate(
--> 290                     val_iter, non_blocking_transfer=non_blocking_transfer
    291                 )
    292 

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
    398 
    399                 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400                     batch_size = inputs[0].size(0)
    401                 else:
    402                     batch_size = inputs.size(0)

AttributeError: 'str' object has no attribute 'size'

Description

In current implementation, batch_size is determined dynamically according to the shape of inputs in LRFinder._validate(). (v0.2.0) L399-L402 will work normally only when given inputs is a torch.tensor. And that's why it failed when inputs is a list of string.

Maybe it's not a usual case that Dataset returns non-torch.tensor values, but I think it would be more easier to access it from DataLoader.batch_size since it's going to iterate a val_loader in LRFinder._validate().

Hence that I proposed a fix for this in that notebook, it's simply add a line batch_size = val_iter.data_loader.batch_size before entering the loop and remove those if-else statement, you can check it out here.

But I'm having doubts about adding a property batch_size in DataLoaderIter, e.g.

class DataLoaderIter(object):
    # ...
    @property
    def batch_size(self):
        return self.data_loader.batch_size

With this property, proposed fix can be simplified a little into this:

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # Set model to evaluation mode and disable gradient computation
        running_loss = 0
        self.model.eval()

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * val_iter.batch_size

        return running_loss / len(val_iter.dataset)

What do you think of it?

davidtvs commented 4 years ago

The current code is like that so that it can handle last batches that don't have the same batch size. Your suggestion is cleaner and works if we force drop_last=True for the validation data loader. Ideally, we would not force drop_last=True and still support datasets that return objects that don't have a size method. I googled a bit and couldn't find a way that is not overcomplicated.

I'll try to think about this a bit more and come back tomorrow/next few days. But I think if we can't find a reasonable way of having both then we should do this change and document that drop_last=True.

davidtvs commented 4 years ago

After some experimenting, replacing size with len works better. Like this:

if isinstance(inputs, tuple) or isinstance(inputs, list):
    batch_size = len(inputs[0])
else:
    batch_size = len(inputs)

but not sure if this would fail for some other type of dataset.

NaleRaphael commented 4 years ago

You are right, I forgot the case that drop_last is False in DataLoader by default.

I was thinking about making some changes in DataLoaderIter to get a general solution, e.g.

class DataLoaderIter(object):
    def get_current_batch_size(self, batch):
        # Users can override this according to their dataset
        return batch.size(0)

# So that we can get batch size dynamically
class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # ...
        batch_size = val_iter.get_current_batch_size(batch)
        # ...

But I suddenly got an idea that size of each batch data is already accessible from the ~outputs or loss~ (Update: should be labels. Because outputs can still be an non-tensor object and shape of loss is determined by the argument reduction of loss function).

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # ...

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * labels.size()

        # ...

It seems like an easier solution for this issue. 🤔

davidtvs commented 4 years ago

ya that looks like a nice solution. From what I can see all loss functions expect the label/target to be a tensor. Could also replace size with len to be even safer.

Either way, you can create a PR with this change

NaleRaphael commented 4 years ago

Great, I'll create a PR for it later.

davidtvs commented 4 years ago

Merged #58. Thanks @NaleRaphael for raising the issue and fixing it

NaleRaphael commented 4 years ago

It has been my pleasure to help. 😎