PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

Hashtags and prompts? #21

Closed Kamino666 closed 5 months ago

Kamino666 commented 5 months ago

Thank you for your excellent work!

Will you release the hashtags of the videos and the prompt used by mPLUG-owl and ChatGPT?

LinB203 commented 5 months ago

The hashtag is split by # in the raw text. For example, "raw": "raw_caption#hashtags". Refer to here

The prompt of mPLUG-owl can be found in Fig. 6 in appendix of paper.

ChatGPT's prompt

You are a captioner. Rewrite the caption without subjective language, describing objectively, directly, and concisely. The returned sentence starts with 'In the video,'. The caption: {}. 
Kamino666 commented 5 months ago

Thank you for your prompt response. However, I still have some questions.

You mentioned in section 4.2 of your paper that you filtered videos without hashtags and some irrelevant words. But in the sample data you provided, many videos do not meet this condition. Here is an example:

  "zki3YQY8uBs": {
    "folder": "coco_vat_test",
    "mplug": "This video demonstrates the step-by-step process of wearing a floral theme ear style for a party look. The model shows how to put on the earrings, which include a floral design, and combines them with other accessories, such as a necklace and bracelet.",
    "polish_mplug": "a model demonstrates the process of wearing a floral theme ear style for a party look. The earrings, featuring a floral design, are shown being put on along with other accessories like a necklace and bracelet.",
    "ofa": [
      " a close up of a person wearing a ring",
      " a woman is putting a ring on her ear",
      " a woman is putting a ring on her ear",
      " a close up of a person with a earring in their ear",
      " a close up of a person with a ear piercing",
      " a woman is holding a ring on her finger",
      " a woman holding a pair of scissors in her hand",
      " a close up of a womans ear with a cluster of diamonds on it"
    ],
    "sound_mplug": "the event is a combination of gentle murmurs and soft clicks as the model carefully puts on the floral theme earrings, elegantly accessorizing with a necklace and bracelet for the party look.",
    "raw": "How To Wear A Floral Theme Ear Style - For The Perfect Party Look!  #shorts"
  },

https://www.youtube.com/watch?v=zki3YQY8uBs

screenshot

When I visited this video, I noticed that the author included more hashtags in the description below the video that didn't exist in the title. So, my question is, are you using hashtags that are in the title or are you using hashtags that are in the description?

BinZhu-ece commented 5 months ago

Not all video descriptions contain this type of hashtags, so we are using the hashtags in the title!

Kamino666 commented 5 months ago

I see, thanks for your quick reply!