Closed samedii closed 2 years ago
We should be able to closely match cleanfid with my code if the model was changed, I think. I tried it once and got it wrong/got really weird results and that's why I've been procrastinating on it. My pseudo-FID is using resizeright, a good resizing algorithm, to produce inputs to the feature extractor so it should be close to cleanfid if the model were changed.
Parallel drawing of the samples for FID is super important to have, diffusion models have kind of slow sampling compared to GANs and since we have that it justifies doing FID evaluations often enough to produce nice plots.
Alright, thanks for the input! I will have a go at it then :)
Update. Got it to pass now if I use their resizing code but it's really weird. Resize right isn't doing the same thing
def pil_resize(image, output_size):
s1, s2 = output_size
def resize_single_channel(x_np):
img = Image.fromarray(x_np.astype(np.float32), mode="F")
img = img.resize(output_size, resample=Image.BICUBIC)
return np.asarray(img).clip(0, 255).reshape(s1, s2, 1)
def func(x):
x = [resize_single_channel(x[:, :, idx]) for idx in range(3)]
x = np.concatenate(x, axis=2).astype(np.float32)
return x
return func(image)
Is it the scaling? It looks like you're converting to range 0-255 when you use PIL to resize, are you doing this for resizeright too or just leaving it at -1-1?
The remaining issue was doing the same as you in this repo, setting pad_mode="reflect"
Only getting atol=1e-3
on resize but I'll see how much it affects reproducing FID before I spend more time on it.
from typing import Tuple
import numpy as np
import torch
import torchvision.transforms.functional as TF
from resize_right import resize as resize_right
from . import settings
def resize(image: torch.Tensor, output_size: Tuple[int, int] = settings.RESIZE_SHAPE):
return resize_right(image, out_shape=output_size, pad_mode="reflect")
def test_resize_same():
from PIL import Image
from cleanfid.resize import build_resizer
image = Image.open("tests/pixelart/dataset_a/out_00003.png")
reference_resize = build_resizer("clean")
resized = resize(TF.to_tensor(image)).clamp(0, 1)
assert np.allclose(
reference_resize(np.array(image)),
resized.permute(1, 2, 0).mul(255).numpy(),
atol=1e-3,
)
Both the FID and KID calculations also had significant differences. The KID implementation especially as it is stochastic in cleanfid. I have something decent working now at least so I'll try to create a PR today or tomorrow
The remaining issue was doing the same as you in this repo, setting
pad_mode="reflect"
Only getting
atol=1e-3
on resize but I'll see how much it affects reproducing FID before I spend more time on it.from typing import Tuple import numpy as np import torch import torchvision.transforms.functional as TF from resize_right import resize as resize_right from . import settings def resize(image: torch.Tensor, output_size: Tuple[int, int] = settings.RESIZE_SHAPE): return resize_right(image, out_shape=output_size, pad_mode="reflect") def test_resize_same(): from PIL import Image from cleanfid.resize import build_resizer image = Image.open("tests/pixelart/dataset_a/out_00003.png") reference_resize = build_resizer("clean") resized = resize(TF.to_tensor(image)).clamp(0, 1) assert np.allclose( reference_resize(np.array(image)), resized.permute(1, 2, 0).mul(255).numpy(), atol=1e-3, )
Ohh, is the remaining difference because you are not clamping the result of resizeright? It doesn't clamp by default to maintain differentiability but for FID we want to clamp.
Implemented. :)
I created a nicer wrapper around cleanfid that works like your implementation with the intention of creating a PR for this repo but it's not working with your accelerator/multiprocessing.
I'm considering if I should try to use your code and switch the model to what they use in cleanfid to try and reproduce the results but before I spend more time on this I thought I should ask if you think it's possible to reproduce or if I will run into issues?
Your implemention looks a lot nicer than cleanfid so I expect it will be easy to work with at least