GaParmar / clean-fid

PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]
https://www.cs.cmu.edu/~clean-fid/
MIT License
972 stars 74 forks source link

About the resize function used by different libraries #1

Closed wiseaidev closed 3 years ago

wiseaidev commented 3 years ago

Recently, I've come across a post on LinkedIn that describes how we should carefully choose the right resize function while stressing the fact that using different libraries/frameworks leads to different results. So, I decided to test it myself. Click here to find the post that I took the inspiration from.

The following is the code snippet that I've edited(using this colab notebook) to give the correct way of using resize methods in different frameworks.

import numpy as np
import torch
import torchvision.transforms.functional as F
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode
from PIL import Image
import tensorflow as tf
import cv2
import matplotlib.pyplot as plt
from skimage import draw

image = np.ones((128, 128), dtype=np.float64)
rr, cc = draw.circle_perimeter(64, 64, radius=45, shape=image.shape)
image[rr, cc] = 0
plt.imshow(image, cmap='gray')
print(f"Unique values of image: {np.unique(arr)}")
print(image.dtype)
output_size = 17
def inspect_img(*, img):
    plt.imshow(img, cmap='gray')
    print(f"Value of pixel with coordinates (14,9): {img[14, 9]}")

def resize_PIL(*, img, output_size):
    img = Image.fromarray(image)
    img = img.resize((output_size, output_size), resample=Image.BICUBIC)
    img = np.asarray(img,dtype=np.float64)
    inspect_img(img=img)
    return img
def resize_pytorch(*, img, output_size):
    img = F.resize(Image.fromarray(np.float64(img)), # Provide a PIL image rather than a Tensor.
                   size=output_size, 
                   interpolation=InterpolationMode.BICUBIC)
    img = np.asarray(img, dtype=np.float64) 
    inspect_img(img=img)
    return img
def resize_tensorflow(*, img, output_size):
    img = img[tf.newaxis, ..., tf.newaxis]
    img = tf.image.resize(img, size = [output_size] * 2, method="bicubic", antialias=True)
    img = img[0, ..., 0].numpy()
    inspect_img(img=img)
    return img
image_PIL = resize_PIL(img=image, output_size=output_size)
image_pytorch = resize_pytorch(img=image, output_size=output_size)
image_tensorflow = resize_tensorflow(img=image, output_size=output_size)
assert np.array_equal(image_PIL, image_pytorch) == True, 'Not Identical!'
# assert np.array_equal(image_PIL, image_tensorflow) == True, 'Not Identical!'  --> fails
assert np.allclose(image_PIL, image_tensorflow) == True, 'Not Close!'
# assert np.array_equal(image_tensorflow, image_pytorch) == True, 'Not Identical!'  --> fails 
assert np.allclose(image_tensorflow, image_pytorch) == True, 'Not Close!'
# tensorflow gives a slightly different values than pytorch and PIL.

which gives us the following results:

result

Therefore, TensorFlow, PyTorch, and PIL give similar results if the resize method is used properly like in the above snippet code.

You can read my comments on linkedin to find out how I came to this solution.

The only remaining library is OpenCV which I'll test in the future.

Have a great day/night!

GaParmar commented 3 years ago

I took a look at your colab notebook and your results. However, the function you are using for resizing torchvision.transforms.functional.resize is just a wrapper around the PIL library and cannot be used with pytorch tensors. This function is not the PyTorch resizing operation that we study and commonly used (as in legacy-pytorch-fid). If you replace your F with torch.nn.functional the above method will not work.

wiseaidev commented 3 years ago

Hey @GaParmar, I've just taken a look at the implementation of the Bicubic interpolation in both OpenCV and PIL, and I've found out that OpenCV uses the Bicubic convolution algorithm which depends on a constant a that can be set to either −0.5 or −0.75. In OpenCV, it is set to -0.75. However, in PIL, I didn't understand which algorithm is being used. All I've found is a constant called `BICUBIC` which is set to 3. Therefore, I'm assuming maybe it uses a different algorithm for the interpolation. Or maybe, somewhere in the code, the constant a is set to -0.5. Regardless of that, we can apply a preprocessing step(like blurring the image) before resizing the image as follow:

def resize_opencv(*, img, output_size):
    #print(help(cv2.resize))
    img = np.asarray(img, dtype=np.float64)
    img = cv2.blur(img, ksize = (8, 8)) # preprocessing step
    img = cv2.resize(img, dsize=(output_size, ) * 2, interpolation=cv2.INTER_CUBIC)
    img = np.asarray(img, dtype=np.float64)
    inspect_img(img=img)
    return img
image_opencv = resize_opencv(img=image, output_size=output_size)

Which would produce a similar result to PIL: Screenshot from 2021-04-29 03-25-20

It is just a rule of thumb and not a standard way of resizing, I think. And that's what I wanted to add. I hope you find this useful in a way or another. Peace out!

GaParmar commented 3 years ago

Thanks for these observations. The resizing ratio is an important factor when computing what blur kernel to apply. See the Fig 8 in our paper for a comparison of this.

I will close this issue. Feel free to re-open this if you have any additional questions