训练集图像尺寸差异过大应该如何配置？

weirman commented 9 months ago

请问下，分类数据集图像尺寸差异过大，应该如何设置 ResizeImage 的相关配置。我希望将图像的短边resize到统一尺寸，长边按照短边进行缩放。目前看到这个回答。当我设置

        - ResizeImage:
            resize_short: 48

提示

    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

changdazhou commented 9 months ago

通过resize_short字段控制的哈，建议参考一下源码https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/data/preprocess/ops/operators.py#L224，并且应该在yml文件中去掉crop相关部分，即可达到你想要的效果哈

weirman commented 9 months ago

我在训练过程中使用的yml文件信息如下：

DataLoader:
  Train:
    dataset:
      name: ImageNetDataset
      image_root: /
      cls_label_path:  /mnt/cls_train/train_new.txt
      transform_ops:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - ResizeImage:
            resize_short: 48
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
            prob: 1.0
            config_str: rand-m9-mstd0.5-inc1
            interpolation: bicubic
            img_size: [320, 48]
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - RandomErasing:
            EPSILON: 1.0
            sl: 0.02
            sh: 1.0/3.0
            r1: 0.3
            attempt: 10
            use_log_aspect: True
            mode: pixel
    sampler:
      name: DistributedBatchSampler
      batch_size: 512
      drop_last: False
      shuffle: True
    loader:
      num_workers: 8
      use_shared_memory: True

但是有错误提示

Traceback (most recent call last):
  File "/root/miniconda3/envs/myconda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/root/miniconda3/envs/myconda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 604, in _thread_loop
    batch = self._get_data()
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
    batch.reraise()
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/worker.py", line 178, in reraise
    raise self.exc_type(msg)
ValueError: DataLoader worker(6) caught ValueError with message:
Traceback (most recent call last):
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/worker.py", line 363, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/fetcher.py", line 86, in fetch
    data = self.collate_fn(data)
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 75, in default_collate_fn
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 75, in <listcomp>
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/collate.py", line 56, in default_collate_fn
    batch = np.stack(batch, axis=0)
  File "<__array_function__ internals>", line 180, in stack
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Traceback (most recent call last):
  File "tools/train.py", line 32, in <module>
    engine.train()
  File "/mnt/PaddleClas-release-2.5_240124/PaddleClas-release-2.5/ppcls/engine/engine.py", line 356, in train
    self.train_epoch_func(self, epoch_id, print_batch_step)
  File "/mnt/PaddleClas-release-2.5_240124/PaddleClas-release-2.5/ppcls/engine/train/train.py", line 24, in train_epoch
    for iter_id, batch in enumerate(engine.train_dataloader):
  File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 825, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:175)

应该是图片虽然被按照 resize_short 进行了缩放，但是没有填充，导致输入图像的宽度不在相同尺寸导致的吧？

changdazhou commented 9 months ago

那crop加上在看下呢，能提供一下训练的哪个模型吗

weirman commented 9 months ago

在训练文种分类模型，crop是指的添加CropImage中的size么。看代码仅在RandomResizedCrop仅仅在里面有CropWithPadding

changdazhou commented 9 months ago

建议参考这个配置文件修改一下配置试试哈https://github.com/PaddlePaddle/PaddleClas/blob/4092cabf77fcbb066823560fb117ed8bca60c924/ppcls/configs/metric_learning/adaface_ir18.yaml#L61

weirman commented 9 months ago

好的，我测试一下，感觉这个模型很难训练，尤其是长宽比出现了很大的变化。

weirman commented 8 months ago

建议参考这个配置文件修改一下配置试试哈

https://github.com/PaddlePaddle/PaddleClas/blob/4092cabf77fcbb066823560fb117ed8bca60c924/ppcls/configs/metric_learning/adaface_ir18.yaml#L61

这个文件是否已经被弃用了呢？

我的配置文件如下：

DataLoader:
  Train:
    dataset:
      name: ImageNetDataset
      image_root: /
      cls_label_path:  /mnt/cls_train/train.txt
      transform_ops:
        - DecodeImage:
            to_rgb: True
            channel_first: False

        - CropWithPadding:
            prob: 0.2
            padding_num: 0
            size: [112, 112]
            scale: [0.2, 1.0]
            ratio: [0.75, 1.3333333333333333]
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
            prob: 1.0
            config_str: rand-m9-mstd0.5-inc1
            interpolation: bicubic
            img_size: [320, 48]
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - RandomErasing:
            EPSILON: 1.0
            sl: 0.02
            sh: 1.0/3.0
            r1: 0.3
            attempt: 10
            use_log_aspect: True
            mode: pixel
    sampler:
      name: DistributedBatchSampler
      batch_size: 512
      drop_last: False
      shuffle: True
    loader:
      num_workers: 8
      use_shared_memory: True

错误信息：with msg: 'CropWithPadding' object has no attribute '_get_param'，另外我发现这个文件中使用的是transform，其他文件都是使用的transform_ops。

错误原因：_get_param 被引用了，但是没有声明，这里应该是一个bug。

cuicheng01 commented 8 months ago

把完整的配置提供一下吧，我们帮你复现下问题

weirman commented 8 months ago

我使用的是PULC language的默认参数，仅仅修改了

DataLoader:
        - CropWithPadding:
            prob: 0.2
            padding_num: 0
            size: [112, 112]
            scale: [0.2, 1.0]
            ratio: [0.75, 1.3333333333333333]

其他都没有改动。问题出现在_get_param()，可以发现在CropWithPadding类中调用了self._get_param，但是其实没有对_get_param进行定义。

changdazhou commented 6 months ago

好的，我们已经记录，后续会进行测试

PaddlePaddle / PaddleClas

训练集图像尺寸差异过大应该如何配置？ #3083