google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.04k stars 140 forks source link

Implementation of contrast() seems wrong #109

Open EIFY opened 1 month ago

EIFY commented 1 month ago

I have created https://github.com/google-research/big_vision/pull/108 for demonstration purpose. In short: the mean here

https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/pp/autoaugment.py#L209-L213 is supposed to be the mean pixel value, but as it is it's just summing over the histogram (therefore equal to height width), divided by 256. For the standard decode_jpeg_and_inception_crop(224), I have verified that mean is always 224 224 / 256 = 196. I have also created the following calibration grid to double-check the transform's behavior, with RGB values (192, 64, 64) for the reddish squares and (64, 192, 192) for the bluish squares:

download (8)

As it is, contrast(tf_color_tile, 1.9) returns the following: download (11) with RGB values (188, 0, 0) and (0, 188, 188). After the fix, contrast(tf_color_tile, 1.9) returns the following: download (12) with RGB values (249, 6, 6) and (6, 249, 249), which is more in line with other implementations. E.g. the approximate torchvision equivalent

from torchvision.transforms.v2 import functional as F
F.adjust_contrast(torch_color_tile, contrast_factor=1.9)

returns RGB values (250, 6, 6) and (6, 250, 250).