MaverickRen / PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
Apache License 2.0
177 stars 5 forks source link

About data generation pipeline #8

Open liuheng92 opened 8 months ago

liuheng92 commented 8 months ago

Hi, I am a little confused about the data generation pipeline. In your article, Fig 4 right panel, does all the caption generate by gpt-4v, for example, "Earphone: xxxxxx"? What are the inputs for gpt-4v?

MaverickRen commented 7 months ago

Yes, all question and answer pairs and object descriptions are generated by GPT. The specific prompts are given in appendix of the paper.

xushilin1 commented 5 months ago

Hi, @MaverickRen Can you provide some basic examples of how to use an API to generate datasets? The API needs input from both the system and the user; I don't know what information you send to the system and user. Thanks.

Norman-Ou commented 5 months ago

Hi, @MaverickRen

As you mentioned, is the description of each object in the image in the upper section of the right panel in Figure 4 generated by GPT4v? Or does the caption of each object come with the dataset?

I quote ur paper, 4.1 MUSE Dataset

RefCOCO guide ssegmentation with explicit target object names, e.g. "orange", lacking more complicated instructions, e.g. "the fruit high in Vitamin-C".

RefCOCO should contain only the class name information and not the description for each object. How do you generate the description information for each object?

Sincerely Norman