AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
https://huggingface.co/AIDC-AI/Ovis1.5-Llama3-8B
Apache License 2.0
105 stars 3 forks source link

Prompts used to build In-house data #2

Closed zjy-ucas closed 1 month ago

zjy-ucas commented 1 month ago

could you please release prompts to build in-house data

liyang-7 commented 1 month ago

Initially, we used fixed and detailed prompt: " Write a detailed description of this image, do not forget about the texts on it if they exist. Also, do not forget to mention the type / style of the image. No bullet points. When writing descriptions, prioritize clarity and direct observation over embellishment or interpretation. Don't forget these rules:

  1. Be Direct and Concise: Provide straightforward descriptions without adding interpretative or speculative elements.
  2. Use Segmented Details: Break down details about different elements of an image into distinct sentences, focusing on one aspect at a time.
  3. Maintain a Descriptive Focus: Prioritize purely visible elements of the image, avoiding conclusions or inferences.
  4. Follow a Logical Structure: Begin with the central figure or subject and expand outward, detailing its appearance before addressing the surrounding setting.
  5. Avoid Juxtaposition: Do not use comparison or contrast language; keep the description purely factual.
  6. Incorporate Specificity: Mention age, gender, race, and specific brands or notable features when present, and clearly identify the medium if it's discernible. "

Later, we adopted the method of Allava. For a given image, GPT-4V generates a set of five candidate questions, from which one question is randomly selected as the prompt. This approach increases the diversity of prompts.

zjy-ucas commented 1 month ago

I greatly appreciate you sharing the prompts and methods used for building in-house data.