HaozheLiu-ST / T-GATE

T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!
MIT License
337 stars 23 forks source link

How to reproduce FID from paper? #12

Closed AlexSidDev closed 3 months ago

AlexSidDev commented 4 months ago

Hi! I'am trying to reproduce results of T_GATE (FID metric) that described in your technical report using SDXL model, DPM scheduler with 25 inference steps and gate step is 10. I'am using MS_COCO 256x256 benchmark from https://github.com/Nota-NetsPresso/BK-SDM.git repository and got very big FID instead of 22.738 that presented in your paper on arxiv. Other metrics that I measure like Inception score and CLIP score is normal. Can you please provide more information about hyperparameters (guidance scale for example), image resolution? What captions used for generation (full validation set from MSCOCO-2014 or MSCOCO-2017, or maybe some subset from them) and what real images was used to measure FID between real and generated samples?

HaozheLiu-ST commented 4 months ago

Hello, thanks for your attention to our work.

Dataset We use MS-COCO-2017 validation data for evaluation. The dataset is directly sourced from here. The image number is set as 10k by setting shuffle=False and sampling the first 10k samples.

Pre-Processing We list pytorch-like code below:

resolution=256
self.transforms = transforms.Compose([
transforms.Resize(resolution),
transforms.CenterCrop(resolution),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

Image Generation The CFG Guide Scale is set to 8.5 and the resolution is set to 1024 (then resized to 256).

It is advisable to reimplement TGATE using your own benchmark, as variations in the evaluation metrics or subsets of the COCO validation dataset can yield significantly different results. Towards this, we will also report our performance on DPG-Bench and MJHQ-10k in next version, please stay tuned!

Below, we provide visualization based on COCO Prompts. The first row presents the images generated by SDXL and the second row is generated by SDXL (TGATE, gate_step=10).

baseline

tgate

AlexSidDev commented 4 months ago

Hello, thanks for your attention to our work.

Dataset We use MS-COCO-2017 validation data for evaluation. The dataset is directly sourced from here. The image number is set as 10k by setting shuffle=False and sampling the first 10k samples.

Pre-Processing We list pytorch-like code below:

resolution=256
self.transforms = transforms.Compose([
transforms.Resize(resolution),
transforms.CenterCrop(resolution),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

Image Generation The CFG Guide Scale is set to 8.5 and the resolution is set to 1024 (then resized to 256).

It is advisable to reimplement TGATE using your own benchmark, as variations in the evaluation metrics or subsets of the COCO validation dataset can yield significantly different results. Towards this, we will also report our performance on DPG-Bench and MJHQ-10k in next version, please stay tuned!

Below, we provide visualization based on COCO Prompts. The first row presents the images generated by SDXL and the second row is generated by SDXL (TGATE, gate_step=10).

baseline

tgate

Thanks for your answer! Do you use prompts corresponding to first 10k real samples from COCO to generate images and than measure FID between real 10k samples from COCO and images that was generated? Am I understand correctly?

HaozheLiu-ST commented 4 months ago

YES. The index is generated by this dataloader. And FID is based on pytorch_fid. If you find this project useful for you, please consider giving a star and citation if available.

HaozheLiu-ST commented 4 months ago

Hello,

after re-checking the code, we find that we incorporate some random operations in preparing data.

To reproduce our results, we will upload a caption and image index later. @AlexSidDev @WentianZhang-ML

Best, Haozhe

WentianZhang-ML commented 3 months ago

Hi @AlexSidDev, sorry for the late reply.

We re-checked the code and re-generated images to reproduce FID. You can use the attached file (idx_caption.txt) to compute FID, which contains 10,000 randomly selected image IDs and their corresponding captions.

Here are the reproduced FID results:

FID
SDXL 24.164
SDXL (m=5) 22.109
SDXL (m=10) 22.917

Hello,

after re-checking the code, we find that we incorporate some random operations in preparing data.

To reproduce our results, we will upload a caption and image index later. @AlexSidDev @WentianZhang-ML

Best, Haozhe

AlexSidDev commented 3 months ago

Hi, @WentianZhang-ML. Thank you very much.