OSU-NLP-Group / MagicBrush

[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
https://osu-nlp-group.github.io/MagicBrush/
Other
308 stars 14 forks source link

Reproduce Evaluation Results (Table 2) #12

Closed BennoKrojer closed 7 months ago

BennoKrojer commented 7 months ago

Hi!

I am currently reproducing your evaluation and have a question about Table 2 "single turn" setting. Does single turn mean you only evaluate the first turn or does it mean you evaluate all turns but with ground truth input?

Example: [ { "input": "242679-input.png", "mask": "242679-mask1.png", "output": "242679-output1.png", "instruction": "Put a cat on the seat." }, { "input": "368667-input.png", "mask": "368667-mask1.png", "output": "368667-output1.png", "instruction": "Have there be a stream running through the field" }, { "input": "368667-output1.png", "mask": "368667-mask2.png", "output": "368667-output2.png", "instruction": "Add a giraffe in the field" },

Would you ignore the last entry in your evaluation?

When I run the evaluation script, it says: Final turn CLIP-I: 0.910077734751122 All turn CLIP-I: 0.9074493353409872

But it doesn't mention single turn.

Do you have the exact script that would lead to the single turn numbers of the MagicBrush model?

Thanks a lot, Benno

BennoKrojer commented 7 months ago

Since I don't care about the "iter" setting, I only generated the "inde" examples and ran your script on these, expecting to somewhere see the numbers from the paper in the output metrics but they were off by a little.

drogozhang commented 7 months ago

Hi, thanks for your question.

Does single turn mean you only evaluate the first turn or does it mean you evaluate all turns but with ground truth input?

-> all turns bu with ground truh input.

Do you have the exact script that would lead to the single turn numbers of the MagicBrush model?

-> In eval_script, we have the single turn numbers.

But it doesn't mention single turn. -> All turn CLIP-I actually means all the single turns with ground truth input. Sorry about the mismatch :) Usually, you would expect the numbers of all turns (single turn) be higher than the final turn (multi turn).

BennoKrojer commented 7 months ago

Thank you! That should work

betterze commented 7 months ago

@drogozhang Could you share the generative images from other methods (open-edit, vegan-clip, and so on) in Table 2? We want to calculate the clip direction loss. Or would you mind adding clip direction loss to Table 2? Thx a lot.

The EmuEdit authors also use the MagicBrush Test Set in their Table 2. However, their score is very different from yours. For example, InstructPix2Pix has a Dino score of 0.767 on their paper, whereas its score is 0.6463 on your paper. Could you tell me why this happens? Thx a lot.

drogozhang commented 7 months ago

Hi, I didn't save these images, I think you can generate them with the provided checkpoint.

For EmuEdit, I don't know too much details but I guess they re-train the models with our data with better training hyper-parameters. OR they use SDXL-InstructPix2Pix.

betterze commented 7 months ago

get it. thx a lot.