LituRout / PSLD

Posterior Sampling using Latent Diffusion
124 stars 12 forks source link

Questions about the details of the box inpainting task (+quantitative results) #5

Closed Min-Jaewon closed 6 months ago

Min-Jaewon commented 6 months ago

Thank you for your interesting research. I have a few questions regarding the details of box inpainting task mentioned in the paper, as it seems to behave differently in the actual code.

image
  1. In the code, box inpainting uses randomly generated masks based on seed, which is not same with centered masked box mentioned in paper. However, I'm wondering where the code is that fixes the mask at the center of the box. Even if the seed is fixed, the randomly generated box should be fixed for each sample, but the mask is not fixed at the center. Can you clarify this? [code]

  2. It is mentioned in the paper that a uniform drop probability between 20% and 80% is applied to every pixel. However, in the actual code, it seems to work differently. Instead of applying a uniformly picked drop probability to each pixel, the code multiplies a single probability, picked uniformly from the range of probability, to select the pixels to be masked. Which method was used in the paper? [code]

  3. In addition to the previous two questions, I'm curious whether the seed was fixed or not when measuring the quantitative results, and if it was fixed, what seed was used. Furthermore, in the paper, omega=1.0 and gamma=0.1 for all tasks, but omeag=0.1, gamma=0.01 are defualt for box inpainting in the code. Which parameter was used for quantiative and qaulitative results?

LituRout commented 6 months ago

Hi Min-Jaewon,

Thanks for your comment.

  1. We experimented with different box inpainting operators: (a) centered boxes ([64:192,64:192] for 256x256 images), (b) random boxes as in DPS (code), and (c) free-form boxes (Figure 1 in the paper). The quantitative results obtained from (b) were not consistent with the baseline DPS results reported in the DPS paper. The authors of the DPS paper acknowledged this difference and clarified that the inconsistency appeared due to randomness in the box location. Since DPS was our strongest baseline, we followed the same protocol (b) as suggested by the DPS authors. However, this should not drastically change the quantitative results because the final results are obtained by averaging over 1000 box-inpainting results.

  2. To be consistent with the baseline, we used the same random inpainting operator as in DPS (code). We also experimented with uniformly dropped pixels and the results did not seem very different due to average over 1000 images. However, it is important to run both DPS and PSLD experiments using the same inpainting operator (either of the two methods you mentioned should be fine).

  3. The seed was fixed to help reproduce the quantitative results. We used the default seed provided in the StableDiffusion codebase, i.e., 42 (code).

Finally, as long as you run these baselines in the same experimental setup, the results can be conclusive. Please check our follow-up paper STSL for more details.

I'd be happy to clarify if you have any more questions.

Min-Jaewon commented 6 months ago

Hi Min-Jaewon,

Thanks for your comment.

  1. We experimented with different box inpainting operators: (a) centered boxes ([64:192,64:192] for 256x256 images), (b) random boxes as in DPS (code), and (c) free-form boxes (Figure 1 in the paper). The quantitative results obtained from (b) were not consistent with the baseline DPS results reported in the DPS paper. The authors of the DPS paper acknowledged this difference and clarified that the inconsistency appeared due to randomness in the box location. Since DPS was our strongest baseline, we followed the same protocol (b) as suggested by the DPS authors. However, this should not drastically change the quantitative results because the final results are obtained by averaging over 1000 box-inpainting results.
  2. To be consistent with the baseline, we used the same random inpainting operator as in DPS (code). We also experimented with uniformly dropped pixels and the results did not seem very different due to average over 1000 images. However, it is important to run both DPS and PSLD experiments using the same inpainting operator (either of the two methods you mentioned should be fine).
  3. The seed was fixed to help reproduce the quantitative results. We used the default seed provided in the StableDiffusion codebase, i.e., 42 (code).

Finally, as long as you run these baselines in the same experimental setup, the results can be conclusive. Please check our follow-up paper STSL for more details.

I'd be happy to clarify if you have any more questions.

Thank you for your detailed response!

  1. When I fixed the seed, the position of masking box is fixed too. Is there any code that can get random masked box while maintaining fixed seed?
  2. How about Q3 that I have just added? The reason why I asked this question is that later parameter was better for most of samples when I tried.
LituRout commented 6 months ago
  1. As I said earlier, we didn't want to propose a new measurement recipe. By using the same operator provided by DPS codebase, we tried to make a fair comparison as much as we could.

  2. Omega parameter controls how well the reconstructed sample satisfies the measurements. Estimating the exact hyper-parameter is a difficult task. If you find better results for omega=0.1 and gamma=0.01, then you should use these hyper-parameters. In the earlier version of the paper we didn't have these many inverse tasks. So the parameters were fixed for all. Later, we added more experiments as suggested by anonymous reviewers where the parameters were slightly different. The rule of thumb is that gamma should not be larger than measurement parameter omega.

Min-Jaewon commented 6 months ago

Thank you!