google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.04k stars 140 forks source link

Behavior of `solarize()` depends on integer overflow #110

Open EIFY opened 1 month ago

EIFY commented 1 month ago

I am not 100% sure about the intention but I do want to raise the alarm. The solarize() transform here

https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/pp/autoaugment.py#L180-L184

inverts the pixel when its value is greater or equal to the threshold, so one would think that higher augmentation magnitude needs lower threshold. However, the threshold increases linearly with magnitude:

https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/pp/autoaugment.py#L513

Counterintuitively, it still works as expected with magnitude=_MAX_LEVEL because of integer overflow. Given

t = tf.constant([[[0,0,0]]], dtype=tf.uint8)

t < i evaluates to tf.Tensor([[[False False False]]], shape=(1, 1, 3), dtype=bool) iff not (i % 256). In other words, magnitude=_MAX_LEVEL means int((level/_MAX_LEVEL) * 256) = 256, which is equivalent to 0 in tf.uint8. Given the following tf_gradient that goes from (0, 0, 0) to (255, 255, 255) in alternating directions

download (13)

Both solarize(tf_gradient, 256) and solarize(tf_gradient, 0) indeed fully invert the image:

download (14)

But if magnitude is 9, int((9/10) * 256) = 230, and solarize(tf_gradient, 230) "abruptly" only inverts a small portion of the image:

download (15)