ShareGPT4Omni / ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
https://sharegpt4v.github.io/
124 stars 4 forks source link

Clarification on Data Source Specific Prompt usage in training vs. dataset #9

Closed yunfeixie233 closed 2 months ago

yunfeixie233 commented 2 months ago

Hi authors,

Thanks for this excellent work.

I've noticed a discrepancy between the paper and the released dataset regarding the use of Data Source Specific Prompts.

The paper mentions using Data Source Specific Prompts to instruct GPT4-Vision for generating detailed descriptions. However, in the released dataset, while the "from": "gpt", "value" entries contain detailed descriptions, the "from": "human", "value" entries only have short, general prompts.

This inconsistency raises questions about how these short "human" prompts were generated and why they were used instead of the longer Data Source Specific Prompts described in the paper. I'm curious if there were any advantages to using these shorter prompts during training.

Could you please clarify this methodology? Understanding the exact approach used in preparing the training data would be valuable for those looking to build upon or replicate your work.

Thank you for your time and for sharing your research.

xiaoachen98 commented 2 months ago

Hi authors,

Thanks for this excellent work.

I've noticed a discrepancy between the paper and the released dataset regarding the use of Data Source Specific Prompts.

The paper mentions using Data Source Specific Prompts to instruct GPT4-Vision for generating detailed descriptions. However, in the released dataset, while the "from": "gpt", "value" entries contain detailed descriptions, the "from": "human", "value" entries only have short, general prompts.

This inconsistency raises questions about how these short "human" prompts were generated and why they were used instead of the longer Data Source Specific Prompts described in the paper. I'm curious if there were any advantages to using these shorter prompts during training.

Could you please clarify this methodology? Understanding the exact approach used in preparing the training data would be valuable for those looking to build upon or replicate your work.

Thank you for your time and for sharing your research.

The query about detailed caption in the training file is sampled from the llava-mix-665k dataset to maintain fairness for the comparison. We have not conducted experiments about utilizing the data source specific prompts for training. If you conduct experiments under this setting and find something, you can contact us to update your findings.