crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.26k stars 372 forks source link

Standard FID #7

Closed samedii closed 2 years ago

samedii commented 2 years ago

I created a nicer wrapper around cleanfid that works like your implementation with the intention of creating a PR for this repo but it's not working with your accelerator/multiprocessing.

I'm considering if I should try to use your code and switch the model to what they use in cleanfid to try and reproduce the results but before I spend more time on this I thought I should ask if you think it's possible to reproduce or if I will run into issues?

Your implemention looks a lot nicer than cleanfid so I expect it will be easy to work with at least

crowsonkb commented 2 years ago

We should be able to closely match cleanfid with my code if the model was changed, I think. I tried it once and got it wrong/got really weird results and that's why I've been procrastinating on it. My pseudo-FID is using resizeright, a good resizing algorithm, to produce inputs to the feature extractor so it should be close to cleanfid if the model were changed.

crowsonkb commented 2 years ago

Parallel drawing of the samples for FID is super important to have, diffusion models have kind of slow sampling compared to GANs and since we have that it justifies doing FID evaluations often enough to produce nice plots.

samedii commented 2 years ago

Alright, thanks for the input! I will have a go at it then :)

samedii commented 2 years ago

Update. Got it to pass now if I use their resizing code but it's really weird. Resize right isn't doing the same thing

def pil_resize(image, output_size):
    s1, s2 = output_size

    def resize_single_channel(x_np):
        img = Image.fromarray(x_np.astype(np.float32), mode="F")
        img = img.resize(output_size, resample=Image.BICUBIC)
        return np.asarray(img).clip(0, 255).reshape(s1, s2, 1)

    def func(x):
        x = [resize_single_channel(x[:, :, idx]) for idx in range(3)]
        x = np.concatenate(x, axis=2).astype(np.float32)
        return x

    return func(image)
crowsonkb commented 2 years ago

Is it the scaling? It looks like you're converting to range 0-255 when you use PIL to resize, are you doing this for resizeright too or just leaving it at -1-1?

samedii commented 2 years ago

The remaining issue was doing the same as you in this repo, setting pad_mode="reflect"

Only getting atol=1e-3 on resize but I'll see how much it affects reproducing FID before I spend more time on it.

from typing import Tuple
import numpy as np
import torch
import torchvision.transforms.functional as TF
from resize_right import resize as resize_right

from . import settings

def resize(image: torch.Tensor, output_size: Tuple[int, int] = settings.RESIZE_SHAPE):
    return resize_right(image, out_shape=output_size, pad_mode="reflect")

def test_resize_same():
    from PIL import Image
    from cleanfid.resize import build_resizer

    image = Image.open("tests/pixelart/dataset_a/out_00003.png")
    reference_resize = build_resizer("clean")

    resized = resize(TF.to_tensor(image)).clamp(0, 1)
    assert np.allclose(
        reference_resize(np.array(image)),
        resized.permute(1, 2, 0).mul(255).numpy(),
        atol=1e-3,
    )
samedii commented 2 years ago

Both the FID and KID calculations also had significant differences. The KID implementation especially as it is stochastic in cleanfid. I have something decent working now at least so I'll try to create a PR today or tomorrow

crowsonkb commented 2 years ago

The remaining issue was doing the same as you in this repo, setting pad_mode="reflect"

Only getting atol=1e-3 on resize but I'll see how much it affects reproducing FID before I spend more time on it.

from typing import Tuple
import numpy as np
import torch
import torchvision.transforms.functional as TF
from resize_right import resize as resize_right

from . import settings

def resize(image: torch.Tensor, output_size: Tuple[int, int] = settings.RESIZE_SHAPE):
    return resize_right(image, out_shape=output_size, pad_mode="reflect")

def test_resize_same():
    from PIL import Image
    from cleanfid.resize import build_resizer

    image = Image.open("tests/pixelart/dataset_a/out_00003.png")
    reference_resize = build_resizer("clean")

    resized = resize(TF.to_tensor(image)).clamp(0, 1)
    assert np.allclose(
        reference_resize(np.array(image)),
        resized.permute(1, 2, 0).mul(255).numpy(),
        atol=1e-3,
    )

Ohh, is the remaining difference because you are not clamping the result of resizeright? It doesn't clamp by default to maintain differentiability but for FID we want to clamp.

crowsonkb commented 2 years ago

Implemented. :)