EB-Dodo / C-MS-Celeb

A clean version (wash list) of MS-Celeb-1M face dataset, containing 6,464,018 face images of 94,682 celebrities
GNU General Public License v3.0
332 stars 94 forks source link

Can’t download the original MS-Celeb-1M dataset? #1

Open LiuJoffrey opened 5 years ago

LiuJoffrey commented 5 years ago

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

EB-Dodo commented 5 years ago

It is a pity that we have lost the original image data due to our carelessness in data preserving in the last two years. This cleaned file list here is what we have now. We actually did not know that Microsoft Research has taken down the original data until we see these issues.

jjsjunior commented 5 years ago

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

Hi Joffrey, Did you managed to find out a download link for MS-Celeb-1M? thanks in advance

youthM commented 5 years ago

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please. Thank you so much

Hi Joffrey, Did you managed to find out a download link for MS-Celeb-1M? thanks in advance

Hi , do you find out a download link for MS-Celeb-1M?

ha1990-12 commented 4 years ago

https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech&hit=1&filelist=1

ibarrond commented 3 years ago

Hi, how do you process the tsv you get from this torrent? I'm not sure what each column contains or how to process it.

ketan-b commented 3 years ago

This should do the task of extracting the images from .TSV

import argparse
import base64
import csv
import os
# import magic # Detect image type from buffer contents (disabled, all are jpg)

parser = argparse.ArgumentParser()
parser.add_argument('--croppedTSV', type=str)
parser.add_argument('--outputDir', type=str, default='raw')
args = parser.parse_args()

with open(args.croppedTSV, 'r') as tsvF:
    reader = csv.reader(tsvF, delimiter='\t')
    i = 0
    for row in reader:
        MID, imgSearchRank, faceID, data = row[0], row[1], row[4], base64.b64decode(row[-1])

        saveDir = os.path.join(args.outputDir, MID)
        savePath = os.path.join(saveDir, "{}-{}.jpg".format(imgSearchRank, faceID))

        # assert(magic.from_buffer(data) == 'JPEG image data, JFIF standard 1.01')

        os.makedirs(saveDir, exist_ok=True)
        with open(savePath, 'wb') as f:
            f.write(data)

        i += 1

        if i % 1000 == 0:
            print("Extracted {} images.".format(i))