Open VJatla opened 2 months ago
Hello,
After going through the paper, I understood that 40% of video-text pairs are used from webvid-10M dataset. Can you please provide me the rationale, or, point me in the direction which helps me understand how these 40% of video are picked.
Hello,
After going through the paper, I understood that 40% of video-text pairs are used from webvid-10M dataset. Can you please provide me the rationale, or, point me in the direction which helps me understand how these 40% of video are picked.