Error when attempting to train: RuntimeError: stack expects each tensor to be equal size, but got [4] at entry 0 and [9] at entry 1

aria-1337 commented 8 months ago

Full error output:

➜  pytorch-ocr git:(main) python3 train.py                
Configurations:
processing:
  device: cuda
  image_width: 180
  image_height: 50
training:
  lr: 0.0003
  batch_size: 8
  num_workers: 4
  num_epochs: 100
bools:
  DISPLAY_ONLY_WRONG_PREDICTIONS: true
  VIEW_INFERENCE_WHILE_TRAINING: true
  SAVE_CHECKPOINTS: false
paths:
  dataset_dir: ./dataset
  save_model_as: ./logs/crnn.pth
model:
  use_attention: true
  use_ctc: true
  gray_scale: true
  dims: 256

/home/a/pytorch-ocr/utils/data_loading.py:68: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  targets_encoded = np.array(targets_encoded)
Dataset number of classes: 22
Classes are: ['-' '8' 'E' 'G' 'P' 'R' 'a' 'b' 'c' 'd' 'e' 'f' 'i' 'k' 'm' 'n' 'o' 'p'
 'r' 't' 'u' 'x']
😪 Training... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/a/pytorch-ocr/train.py", line 69, in run_training
    train_loss = engine.train_fn(model, train_loader, optimizer, device)
  File "/home/a/pytorch-ocr/engine.py", line 26, in train_fn
    for data in track(data_loader, description="😪 Training..."):
  File "/home/a/.local/lib/python3.10/site-packages/rich/progress.py", line 168, in track
    yield from progress.track(
  File "/home/a/.local/lib/python3.10/site-packages/rich/progress.py", line 1210, in track
    for value in sequence:
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/a/.local/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 160, in default_collate
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 160, in <dictcomp>
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/home/a/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [4] at entry 0 and [9] at entry 1

My dataset dir:

➜  dataset git:(main) ls
8-bit.png  Emo.png  Executioner.png  Guardian.png  Punk.png  Rapidfire.png

GabrielDornelles commented 8 months ago

Hello Aria! Unfortunately if using CTC you have to pad your dataset yourself :/

Every image should be of same size in number of characters. Take the biggest one, choose a character to represent empty space and pad them to be all same size. You could do it either on your image files changing its name or in the code.

If you choose to use cross-entropy though, the padding is done automatically by the code.

aria-1337 commented 8 months ago

@GabrielDornelles Thank you! Makes sense and was able to get it working :)

GabrielDornelles / pytorch-ocr

Error when attempting to train: RuntimeError: stack expects each tensor to be equal size, but got [4] at entry 0 and [9] at entry 1 #12