Mingzhen-Huang / D-TIIL

Exposing Text-Image Inconsistency Using Diffusion Models (ICLR 2024)

8 stars 0 forks source link

some problrm of the edited image #2

Closed DanMerry closed 5 months ago

DanMerry commented 5 months ago

Excuse me, we found some problems in the reproduction process, and the edited image is not as realistic as the paper shows. Even though the model_fine_tuning_optimization_steps is set to 500

prompt

prompt = 'Wolf shown recovering will be adopted by a veterinary technician'

original image

example

edited image

first_edit_img The edited image is far from the original, is there any parameter need to be adjusted?

final mask

final_mask_img The final mask can not convey the ground truth of the region.

jialingYK commented 5 months ago

Hi, thank you for bringing this to our attention. Our method is quite sensitive to hyper-parameters. I recommend trying the following settings:

diffusion_model_learning_rate=4e-6, embedding_learning_rate=0.001, model_fine_tuning_optimization_steps=200, text_embedding_optimization_steps=500, threshold=0.17

Prompt

prompt = 'A wolf shown recovering will be adopted by a veterinary technician'

Original image

example

Edited image

edited_image

Final mask

blended_image_with_max_mask

DanMerry commented 5 months ago

Thanks for your response, the effect is awsome!!!