aleju / imgaug-doc

Code and assets to generate the documentation of imgaug
http://imgaug.readthedocs.io
MIT License
152 stars 50 forks source link

Missing Support for arbitary dtypes and frames dimension #25

Open pokecheater opened 2 years ago

pokecheater commented 2 years ago

Hey ho imgaugment Team :),

When I used your component It is not possible to use a dtype other than uint8. Is that correct, and why is that? I see no reason for this restriction since most augmenters like skewing or rotating would work just fine with other dtypes.

Next question how can I augment multi frame images? With a shape: [frames, height, width, channels]. If I use the augmenters like that the frames will be have each it's own augmention applied and therefore the images are not augmented in the same manner.

Greetings and thanks in advance :)

pokecheater commented 2 years ago

By the way the information I referenced above I have from this page: https://imgaug.readthedocs.io/en/latest/source/examples_basics.html image

pokecheater commented 2 years ago

To overcome problem number 2: I found an non intuitive workaround. The solution is to interpret the first dimension also as channels. To do so I move the frames into the channel dimension:

This is done by moveaxis followed by a reshape (wheras the moved frame dimension will be combined with the given channel dimension)

image = np.moveaxis(image_origin, 0, -1)
image = image.reshape(
    image.shape[0],
    image.shape[1],
    image.shape[2] * image.shape[3]
)

For example: if my image had the shape [3, 1024, 1024, 1] (3 Frames, height and width both 1024 pixel, and 1 channel) this lead first to [1024, 1024, 1, 3] and afterward to [1024, 1024, 3].

Since my frames are now inside the channel dimension the applied augmention over all channels are done with the same augmention values and my frames are all transformed at the same rate. image_aug, polygons_aug = augmentation_flow(images=[image], polygons=polygons)

Afterwards, all I have to do is to revert that frames transformation (lucky as I am I stored the PIL image information in an img_meta dictionary).

  image_aug = image_aug[0].reshape(
      img_meta["size"][0],
      img_meta["size"][1],
      img_meta["channels"],
      img_meta["frames"],
  )

  image_aug = np.moveaxis(image_aug, -1, 0)

Works like a charm so far. :)

But the first problem with the different dtypes still persists. I am using uint16 images and the condition that all iamges must have numpy's dtype uint8 violates that. So why is it necessary to use uint8? I can not see any logical reason for this.

Thx in Advance :)