Are your datasets purely handmade? Or was it generated using a large model?

Zplusdragon / UFineBench

[CVPR2024] UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Other

44 stars 1 forks source link

Are your datasets purely handmade? Or was it generated using a large model? #4

Open shams2023 opened 5 months ago

Zplusdragon commented 5 months ago

When constructing the evaluation set UFine3C, we utilized LLMs to increase the style diversity of the texts.

shams2023 commented 5 months ago

在构建评估集 UFine3C 时，我们利用 LLM 来增加文本的风格多样性。

Because currently I also need to create datasets, but I have found that using existing large models to implement captions can lead to a situation where the same pedestrian obtains the same text description from different cameras, resulting in a fixed style and lack of diversity. So I would like to ask you how to obtain captions with diverse styles of text descriptions. Looking forward to your answer

因为目前我也需要对数据集进行创建，但我发现使用现有的大模型来实现caption的时候会导致一种情况：同一个行人在不同摄像头下获得的图像，使用caption生成器得到的文本描述是一样的，这就导致了风格固定，不具有多样性。所以想向你请教，该如何获得风格多变的文本描述caption。期待你的回答谢谢！ Thank you!

Zplusdragon commented 5 months ago

You can try using multiple large models rather than a fixed one.

shams2023 commented 5 months ago

您可以尝试使用多个大型模型，而不是固定模型。

感谢你的回复，我会尝试的谢谢

Thank you for your reply. I will try thanks

shams2023 commented 5 months ago

I'm very sorry to bother you again. It suddenly occurred to me that my images were taken in the evening or in the dark, where the resolution of the images was not high and they were generally blurry. For this situation, I have tried using BLIP, qwen, scholar Pu language What direction should be taken to handle situations where large models cannot achieve good results? I hope to get some inspiration from you, thank you very much!

Zplusdragon commented 5 months ago

It is best to manually annotate the dark images to obtain higher quality texts. If you are unable to do so, you can try using some nighttime image enhancement algorithms to pre-process the images before using large multi-modal models.

shams2023 commented 5 months ago

It is best to manually annotate the dark images to obtain higher quality texts. If you are unable to do so, you can try using some nighttime image enhancement algorithms to pre-process the images before using large multi-modal models.

Thank you for your reply.