essunny310 / FreestyleNet

[CVPR 2023 Highlight] Freestyle Layout-to-Image Synthesis
https://essunny310.github.io/FreestyleNet/
MIT License
149 stars 3 forks source link

A few questions about in-distribution evaluation #3

Closed AC27MJ closed 1 year ago

AC27MJ commented 1 year ago

Hi, thanks for your great work! I met some problems while reproducing your quantitative results. May I ask a few questions about in-distribution evaluation (ADE20K dataset)?

  1. Which code did you use to calculate the FID? Is it this one? https://github.com/mseitzer/pytorch-fid
  2. What resolution did you evaluate the FID metric on, 256x256 or 512x512?Which interpolation method did you use to resize the ground-truth images and synthesized images?
  3. Did you use this code to calculate the mIoU of ADE20K? https://github.com/CSAILVision/semantic-segmentation-pytorch And is this model that you used to predict the semantic label of synthesized images? http://sceneparsing.csail.mit.edu/model/pytorch/ade20k-resnet101-upernet/
  4. If so, did you change the config file during your mIoU evaluation? https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/8f27c9b97d2ca7c6e05333d5766d144bf7d8c31b/config/ade20k-resnet101-upernet.yaml#L6 . Besides, what is the resolution of your ground-truth label? Hoping to hear from you, thanks again for your great job.
essunny310 commented 1 year ago

Hi,

Thanks for your interest in our paper.

  1. Yes.
  2. Since the images generated by other baselines are 256x256, we compute FID at 256x256. Bicubic interpolation is adopted to resize images.
  3. Yes, you are correct.
  4. We do not change this config. The ground-truth labels remain untouched during mIoU evaluation, and we resize the generated images to the same resolution of the ground-truth labels.
AC27MJ commented 1 year ago

Thank you for your reply! I followed your instructions but still can not get correct results. Here are my steps while evaluation:

  1. create conda environment "freestyle" with environment.yaml
  2. Prepare data and datalist (I noticed initial noise is different for each image, so here is my list file, in numerical order) image
  3. run 'sample_ADE20K.sh'

For FID metric:

  1. resize synthesized images and ground-truth images with: Image.open('input.jpg').resize((256,256),Image.BICUBIC).save('out.jpg')
  2. run cmd python -m pytorch_fid resized_synthesized_filepath resized_groundtruth_filepath But I got: image

For mIoU metric:

  1. resize the synthesized image to original resolution
  2. run cmd python3 eval_multipro.py --gpus 0 --cfg config/ade20k-resnet101-upernet.yaml But I got: image

which is different from the results reported in your paper. May I ask if I did something wrong during the evaluation process?

As an example, this is the synthesized result of "ADE_val_00000035.jpg" image Is this the same result as yours? If not, could you please share the result of your synthesized results? Thank you!

essunny310 commented 1 year ago

Hi, you can find our results here. Please conduct the evaluation again and feel free to reach out to me if there are still problems.

AC27MJ commented 1 year ago

Thank you for your reply! I have reproduced the same results from your synthesized images. BTW, I am trying to finetune the model on the ADE dataset from scratch, and I noticed that you trained on the ADE dataset for 2 days on a single A100 40G GPU. Since I have no A100 GPU, so could you tell me how many iterations you fine-tuned the model on the ADE dataset? Thank you~

essunny310 commented 1 year ago

It takes ~300K steps.

AC27MJ commented 1 year ago

Thank you.

SnowdenLee commented 1 year ago

Hi @essunny310 ,

thanks for the great work. I have some questions about the Cityscapes evaluation. Which segmentation network from which repo did you use for the Cityscapes evaluation? And how many images did you generate for comparison?

Thanks a lot in advance!

essunny310 commented 1 year ago

Hi, we only present some visual results on Cityscapes in the appendix (Figure S10) to showcase the validity of FreestyleNet on rectangular datasets. For the quantitative evaluation, you can refer to OASIS (I think they use the following repo https://github.com/fyu/drn).

SnowdenLee commented 1 year ago

Hi @essunny310 , could you maybe share the pretrained model of UperNet101 trained on ADE20K? The link in https://github.com/CSAILVision/semantic-segmentation-pytorch is dead unfortunately... Thanks a lot!