davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch
MIT License
921 stars 120 forks source link

Issue with DataLoader with lr_finder.range_test #71

Closed phongvu009 closed 9 months ago

phongvu009 commented 3 years ago

I try to use:

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        return batch_data["img"], batch_data["target"]

to work with DataLoader for the lr_finder.range_test() but still got the error: TypeError: list indices must be integers or slices, not str

TypeError                                 Traceback (most recent call last)
<ipython-input-60-b2a8b27d6c88> in <module>()
      3 optim = torch.optim.Adam(model_ft.parameters(), lr=1e-7, weight_decay=1e-2)
      4 lr_finder = LRFinder(model_ft,optim, criterion, device='cuda')
----> 5 lr_finder.range_test( custom_train_iter ,end_lr=100,num_iter=100)
      6 lr_finder.plot()
      7 lr_finder.reset()

3 frames
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    369         self.optimizer.zero_grad()
    370         for i in range(accumulation_steps):
--> 371             inputs, labels = next(train_iter)
    372             inputs, labels = self._move_to_device(
    373                 inputs, labels, non_blocking=non_blocking_transfer

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in __next__(self)
     57         try:
     58             batch = next(self._iterator)
---> 59             inputs, labels = self.inputs_labels_from_batch(batch)
     60         except StopIteration:
     61             if not self.auto_reset:

<ipython-input-58-f89d28995874> in inputs_labels_from_batch(self, batch_data)
      4 
      5 
----> 6         return batch_data["img"], batch_data["target"]
      7 
      8 custom_train_iter = CustomTrainIter(train_dl)

TypeError: list indices must be integers or slices, not str

Any suggestion ? thanks !

NaleRaphael commented 3 years ago

Hi @doob09, it seems that your data batch_data is a list, not a dict. You might need to checkout the implementation of your dataset (e.g. torch.Dataset) to see whether the returned value of __getitem__ is actually a dict or not.

Besides, you can try to insert import pdb; pdb.set_trace() before returning values in inputs_labels_from_batch():

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        import pdb; pdb.set_trace()    # <- like this
        return batch_data["img"], batch_data["target"]

This might help you investigate the problem easier.

If the problem is still no resolved, please feel free to update this thread with further information.

NaleRaphael commented 3 years ago

By the way, you can also checkout the implementation of base class DataLoaderIter https://github.com/davidtvs/pytorch-lr-finder/blob/acc5e7ee7711a460bf3e1cc5c5f05575ba1e1b4b/torch_lr_finder/lr_finder.py#L31-L41

In line 39, batch_data is unpacked into 3 values inputs, labels, *_, that's because we assume that the input batch_data is packed as a list or tuple (in the form of (inputs, labels, ... other required info)).

phongvu009 commented 3 years ago

Hi @doob09, it seems that your data batch_data is a list, not a dict. You might need to checkout the implementation of your dataset (e.g. torch.Dataset) to see whether the returned value of __getitem__ is actually a dict or not.

Besides, you can try to insert import pdb; pdb.set_trace() before returning values in inputs_labels_from_batch():

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        import pdb; pdb.set_trace()    # <- like this
        return batch_data["img"], batch_data["target"]

This might help you investigate the problem easier.

If the problem is still no resolved, please feel free to update this thread with further information.

It will return

(Pdb) type(batch_data)
<class 'list'>

Let me try to convert list into dict.

NaleRaphael commented 3 years ago

Sorry for the late reply.

Actually, you don't need to convert batch_data from list to dict. You can just unpack batch_data in the inputs_labels_from_batch(), like this:

# assume that `batch_data` is in the form of:
# [
#     [fn_img_01, fn_img_02, ..., fn_img_n],
#     [label_01, label_02, label_n],
# ]
def inputs_labels_from_batch(self, batch_data):
    img, target = batch_data
    return img, target

Or this approach:

# assume that `batch_data` is in the form of:
# [
#     [fn_img_01, label_01],
#     [fn_img_02, label_02],
#     ...
#     [fn_img_n, label_n],
# ]`imgs`
def inputs_labels_from_batch(self, batch_data):
    img = [v[0] for v in batch_data]
    target = [v[1] for v in batch_data]
    return img, target

Because inputs_labels_from_batch() was designed to avoid users modifying their existing code of dataset/data loader. You can just implement your logic inside it.

And just note that you have to make sure the returned value of inputs_labels_from_batch() have to be 2 array-like objects, just like the line 41 shows: https://github.com/davidtvs/pytorch-lr-finder/blob/acc5e7ee7711a460bf3e1cc5c5f05575ba1e1b4b/torch_lr_finder/lr_finder.py#L31-L41

phongvu009 commented 3 years ago

My dataset is like as the first one you mentioned :

# assume that `batch_data` is in the form of:
# [
#     [fn_img_01, fn_img_02, ..., fn_img_n],
#     [label_01, label_02, label_n],
# ]
def inputs_labels_from_batch(self, batch_data):
    img, target = batch_data
    return img, target

It works perfectly now. I appreciate your help. Is it like a abstract classes concept? I just use python in few months for deep learning project.

NaleRaphael commented 3 years ago

My pleasure :)

It's simply a class inheritance here. And the reason why we have to wrap a torch.Dataset or torch.DataLoader with DataLoaderIter is that we need to make it:

  1. flexible for customization under the restriction of fixed input format: As you can see the following code, most of the PyTorch models follow this convention to build a forward pass. https://github.com/davidtvs/pytorch-lr-finder/blob/acc5e7ee7711a460bf3e1cc5c5f05575ba1e1b4b/torch_lr_finder/lr_finder.py#L376-L378 But since it's just a convention rather than a syntax-level design, we want to minimize the effort to rewrite the code once the inputs of model.forward() or outputs of dataset.__getitem__() are complicated. In this case, user can implement their own inputs_labels_from_batch() to make it work with current implementation of LRFinder without modifying it.

  2. able to iterate training dataset infinitely: Since we cannot guarantee that the length of dataset is sufficient for running a learning rate range test with given settings (related to num_iter and batch_size), we have to kept a dataset accessible until the range test is finished. And that's why we made a TrainDataLoaderIter like this: https://github.com/davidtvs/pytorch-lr-finder/blob/acc5e7ee7711a460bf3e1cc5c5f05575ba1e1b4b/torch_lr_finder/lr_finder.py#L51-L67

Hope these information help, and wish you a good luck on this learning journey. ;)

gallenaxel commented 1 year ago

Hi! Just finding this thread, and I've got the same issue, namely that lr_finder.range_test() doesn't like the batch type. Have tried the posted solutions without any luck, and the issue seems to circle back to that my DataLoader is of type train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=bs, shuffle=True)

print(type(train_dataloader)) --> <class 'torch.utils.data.dataloader.DataLoader'>

but lr_finder.range_test() gives the error: ValueError: Your batch type is not supported: <class 'torch.Tensor'>. Please inherit fromTrainDataLoaderIterorValDataLoaderIterand override theinputs_labels_from_batchmethod..

Using the posted solutions, e.g. inputs_labels_from_batch(), now yields two lists and they're not supported either. Do you have any suggestions how to move forward from here? Thanks!

NaleRaphael commented 1 year ago

Hi @gallenaxel, can you provide further information regarding the dataset? If it's possible, you can post the implementation of dataset.__getitem__() here, it will be helpful. Because this issue is usually related to the format of returned value from this function.

Generally, we expect dataset.__getitem__() would return two tensors inputs and labels as shown below. (the third value with asterisk operator * below is used for holding any further return value that's not related to model training directly) https://github.com/davidtvs/pytorch-lr-finder/blob/acc5e7ee7711a460bf3e1cc5c5f05575ba1e1b4b/torch_lr_finder/lr_finder.py#L31-L41

gallenaxel commented 1 year ago

Hi @NaleRaphael! Thanks for the response. Sorry if I didn't quite do what you wanted me to, but I think this is what you want to see.

Doing the following:

Screenshot 2023-03-20 at 15 31 17

yields this:

Screenshot 2023-03-20 at 15 29 57

Thanks for your patience, and hope this helps :)

NaleRaphael commented 1 year ago

Hi @gallenaxel, thanks for sharing it.

In the screenshots above, it seems not a usual use case to me because the dataset train_ds is declared as a tensor of the actual training data train_data. However, you might be confused by the asterisk operator * for variable unpacking here, it makes the code above work but it's not telling you where the problem is. So let me clarify this first.

Assume that your training data train_data is an array with shape [N, 24], the returned values of 3rd line you wrote is actually:

# 3rd line
inputs_data, labels_data, *_data = train_data

# The line above is equivalent to this. You can print the length of `_data` to confirm.
inputs_data = train_data[0]   # shape: [1, 24]
labels_data = train_data[1]   # shape: [1, 24]
_data = list(train_data[2:])  # this is a list! length: N-2, shape of each element: [1, 24]

# Or run these assertions to check
assert np.allclose(inputs_data, train_data[0])
assert np.allclose(labels_data, train_data[1])
assert np.allclose(_data, [v for v in train_data[2:]])

This is the same reason why other cases work as well (but also doesn't work the way you think it does). Since data and labels comes from the same array, I'm afraid it's not the purpose you want.

If this is actually what you want to do, then you need to override the function (inputs_labels_from_batch) mentioned in my previous comment to make it work with this kind of input. Otherwise, you might need to create a dataset by following this official tutorial.

Please feel free to let me know if you have further questions regarding this issue.

davidtvs commented 9 months ago

Closing due to inactivity