Filter YFCC data - Githubissues

Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

622 stars 31 forks source link

Filter YFCC data #13

Open Hxyou opened 2 years ago

Hxyou commented 2 years ago

Hi, thanks for the great work. After downloading the provided YFCC15M label file, I can see there are three keys caption filename url in each one of the labels. how should we find the corresponding YFCC image according to your label? i.e., which key should we use to align with YFCC data?

SlotherCui commented 2 years ago

You can use the url as key , and filename for check

raytrun commented 2 years ago

The image name of YFCC data seems to be a md5 encoding. I'm also a little confused about how to make a connection.

DonkeyShot21 commented 1 year ago

I am also trying to filter YFCC and I have the same issue. The dataset I have downloaded has a very different structure, and I don't know how to find the images based on the filename that you provide. Also I am not sure about what you mean by "Prepare the YFCC15M subset metadata pickle by the label".

My version of YFCC100M looks exactly the same as the one they have in the SLIP repo. Do you organise the data in a different way?