Worrying discrepancy between PIL Resize and Imgaug Resize

I am resizing a 1920x1080 image to be 1333x750 pixels using bilinear interpolation. On this simple task, PIL Resize and Imgaug Resize (master) shows very worrying differences.

import numpy as np
from PIL import Image
import imgaug.augmenters as iaa

img_fpath = "img.png"
with Image.open(img_fpath) as f:
    in_image = f.convert('RGB')
img_np = np.asarray(in_image)

pil_image = Image.fromarray(img_np)
pil_image = pil_image.resize((1333, 750), Image.BILINEAR)
image = np.asarray(pil_image)

aug = iaa.Resize({"height": 750, "width": 1333}, interpolation="linear")
img_augmented = aug(image=img_np)

print("img, ", np.mean(img_np))
print("pil, ", np.mean(image))
print("iaa, ",np.mean(img_augmented))

The result I get back are img, 96.09632989326131 pil, 96.1052009669084 iaa, 95.98408402100524 where the Pil and ImgAug resizing are very clearly different but the PIL one seems to more accurately maintain the average color values of the original.

Its not clear to me why they should have different performance when they both use bilinear interpolation on the same data (I could actually see a difference in performance on a downstream detection task on a model originally trained on the pil resizing). The image used here is the test image "img.png" from https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/tree/master/test

imgaug uses cv2.resize(), so this is really just the difference between PIL and cv2. I could imagine minor deviations if one of them casts float values to int and the other rounds them before casting. Though I just tried a small test script on a few example images and do not see one of the methods performing clearly more accurate when the other one, unless you use area interpolation. Small deviations on the order of 0.1 are imho expected. As information is removed, the average pixel values will never match perfectly. If such small deviations are already enough to significantly affect a model, I would worry more about that model's robustness than about the underlying resize method. Though there is of course the possibility that one of the methods introduces recurring patterns in the image that are imperceptible to the human eye (similar to JPEG compression artifacts) while the other method does not. Detecting these would need much more than a simple average though.

import imageio
import numpy as np
import imgaug as ia
import PIL.Image

def main():
    image_urls = [
        "https://upload.wikimedia.org/wikipedia/commons/8/8c/South-western_black_rhinoceros_%28Diceros_bicornis_occidentalis%29_female.jpg",
        "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/Squirrel_posing.jpg/919px-Squirrel_posing.jpg",
        "https://upload.wikimedia.org/wikipedia/commons/2/2e/MC_Drei-Finger-Faultier.jpg",
        "https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Church_Heart_of_the_Andes.jpg/1280px-Church_Heart_of_the_Andes.jpg"
    ]
    images = [imageio.imread(url) for url in image_urls]

    for inter in ["linear", "cubic", "area"]:
        inter_pil = {
            "linear": PIL.Image.BILINEAR,
            "cubic": PIL.Image.BICUBIC,
            "area": PIL.Image.BOX
        }[inter]
        for dt in ["uint8"]:
            print("-------------")
            print(f"{inter} {dt}")
            print("-------------")
            for image in images:
                image = image.astype(np.dtype(dt))

                height, width = int(image.shape[0] * 0.5), int(image.shape[1] * 0.5)
                image_ia = ia.imresize_single_image(
                    image, (height, width), interpolation=inter
                )
                image_pil = np.asarray(
                    PIL.Image.fromarray(image).resize(
                        (width, height), resample=inter_pil
                    )
                )

                assert image_ia.shape == image_pil.shape

                avgs = [np.average(im) for im in [image, image_ia, image_pil]]
                diffs = [avgs[0] - avgs[1], avgs[0] - avgs[2]]
                print(
                    "averages orig[%7.3f] pil[%7.3f] ia[%7.3f] "
                    "| diffs pil[%7.3f] ia[%7.3f]" % (
                        *avgs,
                        *diffs
                    )
                )

if __name__ == "__main__":
    main()

Output:

-------------
linear uint8
-------------
averages orig[110.143] pil[110.269] ia[110.269] | diffs pil[ -0.126] ia[ -0.127]
averages orig[ 78.866] pil[ 78.808] ia[ 78.931] | diffs pil[  0.059] ia[ -0.065]
averages orig[110.512] pil[110.637] ia[110.636] | diffs pil[ -0.125] ia[ -0.124]
averages orig[ 98.146] pil[ 98.011] ia[ 98.208] | diffs pil[  0.134] ia[ -0.062]
-------------
cubic uint8
-------------
averages orig[110.143] pil[110.142] ia[110.147] | diffs pil[  0.000] ia[ -0.005]
averages orig[ 78.866] pil[ 78.889] ia[ 78.874] | diffs pil[ -0.022] ia[ -0.008]
averages orig[110.512] pil[110.511] ia[110.518] | diffs pil[  0.001] ia[ -0.006]
averages orig[ 98.146] pil[ 98.196] ia[ 98.155] | diffs pil[ -0.051] ia[ -0.010]
-------------
area uint8
-------------
averages orig[110.143] pil[110.269] ia[110.642] | diffs pil[ -0.126] ia[ -0.499]
averages orig[ 78.866] pil[ 78.866] ia[ 79.361] | diffs pil[  0.000] ia[ -0.495]
averages orig[110.512] pil[110.637] ia[110.994] | diffs pil[ -0.125] ia[ -0.482]
averages orig[ 98.146] pil[ 98.146] ia[ 98.690] | diffs pil[ -0.000] ia[ -0.544]

aleju / imgaug

Worrying discrepancy between PIL Resize and Imgaug Resize #681