dorarad / gansformer

Generative Adversarial Transformers
MIT License
1.33k stars 149 forks source link

AssertionError with prepare_data.py #29

Closed andrew-alm closed 2 years ago

andrew-alm commented 2 years ago

When using prepare_data.py to create a custom dataset for training, I keep encountering an AssertionError. I've looked at the code, but I'm not sure what exactly is causing this shape mismatch. I've also tried different environments to hopefully rule out anything related to that aspect.

Environments:

Data:

Command: python prepare_data.py --task covidx --images-dir /data/2A_images --format png --ratio 0.7 --shards-num 20 --max-images 194922

Error:

Preparing the covidx dataset...
Loading images from /data/2A_images
  8%|██▋                               | 15340/194922 [09:52<1:55:31, 25.91it/s]
Traceback (most recent call last):
  File "prepare_data.py", line 217, in <module>
    run_cmdline(sys.argv)
  File "prepare_data.py", line 214, in run_cmdline
    prepare(**vars(args))
  File "prepare_data.py", line 185, in prepare
    shards_num = shards_num, max_imgs = max_images)
  File "prepare_data.py", line 78, in <lambda>
    "png": lambda tfdir, imgdir, **kwargs: dataset_tool.create_from_imgs(tfdir, imgdir, format = "png", **kwargs),
  File "/home/dev/gansformer/dataset_tool.py", line 696, in create_from_imgs
    tfr.add_img(img)
  File "/home/dev/gansformer/dataset_tool.py", line 84, in add_img
    assert img.shape == self.shape
AssertionError

The process always fails at exactly 15340. I've removed image 15340 (repeatedly down the line), but the error keeps happening.

dorarad commented 2 years ago

Hi! I believe it's an issue with the files in the data directory you use. I suggest printing the sizes of self.shape and img.shape, as well as the image filename it uses when throwing the exception. And then you can track down the problematic image.

The image files are sorted using python sort() so if they are e.g. 1.png, 2.png, ..., 11.png it's going to sort them into: 1.png, 11.png, 2.png, and so it might be the case that you didn't remove the problematic image. Also note that when you do find the image that isn't good you may just need to crop/pad/resize it to the right shape rather than deleting it. Good luck and let me know how it goes!

andrew-alm commented 2 years ago

The sorting was throwing me off. I modified the assert statement, and I guess the directory contains different sized images. I had been creating my own TF Data datasets mapping a resize function which made it so I wasn't aware.

Thanks for the help, this can be closed.