3DTopia / GPTEval3D

[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation"
228 stars 5 forks source link

Release request for image prompts for image-to-3D methods in GPTEval3D #2

Closed victor-thu closed 4 months ago

victor-thu commented 4 months ago

Hi authors for GPTEval3D:

GPTEval3D is a wonderful work that I'm planning to follow. I noticed that for Image-to-3D methods like Wonder3D, you utilized Stable Diffusion XL to generate images conditioned on text as input to these models. Thus I think it would be great help if those image prompts are released as text prompts in data/tournament-v0/prompts.json. I'm afraid if I generate images based on text prompts by my self will somewhat introduce deviation in evaluation.

Thanks for your great contribution and looking forward to your reply. I'll be very appreciated if those image prompts are released.

stevenygd commented 4 months ago

Thanks for taking interests in our metrics! yes we can add these prompts to our next release. @wutong16 @Lizb6626 can follow up.

Lizb6626 commented 4 months ago

We have released 110 image prompts. Download the gallery at https://drive.google.com/file/d/1YUeQB3hKAT8D1uiQQdC3Ti_O7fpCo0Ud/view?usp=sharing.

victor-thu commented 4 months ago

I have fetched your image prompts gallery through the link you share. It really helps! By the way after checking your gallery, I found there exists some image prompts that are too complex or abstract to generate other views. Since the quality of the input image greatly affects the Image-to-3D generation process, I think the rule for filtering image prompts actually matters. I'm interested in how you chose your image prompts to guarantee a fair comparison between the text- and image- methods. Thanks again for your gallery and expecting your inspiring thoughts.

Lizb6626 commented 4 months ago

The generated images heavily rely on the input text prompt. For instance, if the text prompt is intricate, like "A small, solid, radially symmetrical, iridescent abalone shell, with jagged contours, hosting a miniature, tranquil Zen garden complete with tiny, raked sand and micro bonsai," the resulting image will inherently be complex. This complexity can pose a challenge for Image-to-3D methods to generate novel views, as most of these methods currently focus on the object-level representation. We believe this doesn't introduce unfairness in the comparison between text-to-3D and image-to-3D methods, as these prompts are equally challenging for text-based methods. Nevertheless, we aim to conduct a comprehensive evaluation and hope that it can be applied to more advanced methods in the future.

In selecting the image prompts, we prioritize those where the image is correctly aligned with the text prompt and the depicted object is complete (e.g., a whole cat instead of just a cat head). We also avoid images with inaccurate shapes, such as multiple legs on a frog. While it may be difficult to satisfy all the constraints for a small number of text prompts, we strive to get as close as possible to fulfilling those conditions.

We acknowledge that there is still much to explore in evaluating image-to-3D techniques, and our work is far from perfect. Please stay tuned for our upcoming research in this area.

victor-thu commented 4 months ago

Got it! Thank you for explaining and solving my doubts. No more issues for the time being ^_^