Open hanzhn opened 2 months ago
Or can you point out where I can find authoritative code for these calculation? That will be helpful.
Hi,
Thank you for bringing these issues to our attention.
We've noticed that the original Emu Edit paper and dataset do not specify the versions of CLIP and DINO used. To align with other benchmarks, we adopted the settings used by MagicBrush (GitHub Repository). Specifically, the versions are "ViT-B/32"
for CLIP and "dino_vits16"
for DINO embeddings. We ensured consistency by rerunning all results in our paper based on the Emu Edit benchmark.
Regarding the dataset split issue: we utilized the test set of emu_edit_test_set for our evaluations. And due to mistakenly swapped dataset, our reported results were based on the validation set from the emu_edit_test_set_generations.
Also, there are known issues with the benchmark quality as discussed in this discussion thread. Some image-caption pairs seem incorrect, like placeholder captions (e.g., 'a train station in city') or identical source and target captions.
For the metrics evaluation, we adhered closely to the MagicBrush evaluation script (GitHub Link) for both benchmarks with no major modifications. We plan to share our refined evaluation code soon; however, in the meantime, you can refer to the provided script in MagicBrush for immediate use.
I appreciate your excellent work of Instruction-based Editing. Thanks for your efforts!
I have some questions for you about the Emu Edit Benchmark metrics.