deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.81k stars 5.34k forks source link

can you share the ms-1m clean list? #19

Closed HaoLiuHust closed 6 years ago

HaoLiuHust commented 6 years ago

Thank you for your wonderful work, do you have plan to release the clean list, I want to use it with another alignment method

nttstar commented 6 years ago

https://pan.baidu.com/s/1eTn6O62

HaoLiuHust commented 6 years ago

thank you very much

HaoLiuHust commented 6 years ago

will cleaned megaface release?

nttstar commented 6 years ago

Yes

vzvzx commented 6 years ago

is your image address has another logic? the format is "./m.0933t2/0.jpg" in your ms1m_clean_list.txt file, but the source imge address is like "./MS-Celeb-1M/img/m.0933t2/0-FaceId-0.jpg"?

nttstar commented 6 years ago

check MS1M tsv file and use the 2nd column as image id.

cleardusk commented 6 years ago

your use the 2nd column as image id

I am still confused, how do you map the original images named like "m.0933t2/0-FaceId-0.jpg" to "m.0933t2/0.jpg" (this map may be not right)

nttstar commented 6 years ago

the second column of ms1m tsv file is a single unique integer for each identity. Do not use the filename like "m.0933t2/0-FaceId-0.jpg" in our case.

cleardusk commented 6 years ago

the second column of ms1m tsv file is a single unique integer for each identity.

In this case, where is the image data?

Take one mid m.0933t2 as example

m.0933t2    "Misty Lee"@en
m.0933t2    "Misty Lee"@pl
  1. Are these two mids one person?
  2. How to parse the original images?
nttstar commented 6 years ago

downloaded filename: FaceImageCroppedWithOutAlignment.tsv the first line: m.0107_f 0 http://getbeatmadrid.files.wordpress.com/2013/01/magic-alex.jpg ....... images were base64 encoded in this large text file.

cleardusk commented 6 years ago

Great thanks, I got it. But FaceImageCroppedWithOutAlignment.tsv in https://msceleb.blob.core.windows.net/ms-celeb-v1-split?restype=container&comp=list is invalid now, could you please provide one download link?

HaoLiuHust commented 6 years ago

@cleardusk seems the file is split into 8 parts, you can find it on msceleb website

cleardusk commented 6 years ago

@HaoLiuHust I explored http://www.msceleb.org/ and https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/ but I didn't find 8 parts raw tsv file.

HaoLiuHust commented 6 years ago

https://msceleb.blob.core.windows.net/ms-celeb-v1-cropped-split?restype=container&comp=list you can find them here, however, seems the file is not as much as the author said

cleardusk commented 6 years ago

Will you report the performance only with your cleaned ms-1m list? I think it is more reasonable and comparable in the academic community (it's fine to report the performance using any data on MegaFace Website).

nttstar commented 6 years ago

@cleardusk Different clean lists are also not comparable. We can fix the dataset and compare the relative performance for different networks/losses.

cleardusk commented 6 years ago

Different clean lists are also not comparable

That's right, I just wonder how high the performance will be without using extra data.

Additionally, InsightFace paper writes

As is well known that the above mentioned three attributes, data, network and loss, have a high-to-low influence on the performance of face recognition models.

The reviewers may want to see some experiments on this conclusion. Such as the performance differences using un-cleared and cleaned ms-1m list

nttstar commented 6 years ago

@cleardusk Right, and we're doing such experiments :) Thank you for your good advice.

ghost commented 6 years ago

Can you please share me your private data I need it for my research in order to study the impact of any additional data on MS?! Thanks.

vzvzx commented 6 years ago

after extract image with your logic, but ./m.0933t2/16.jpg is not exist in ms-1m, is this in your private data? or there is another extra data in ms-1m?

cleardusk commented 6 years ago

@vzvzx How many images do not exist in ms-1m? I also plan to extract images according to this cleaned list.

cleardusk commented 6 years ago

@nttstar I checked the id m.0107_f, the image id is named by the second column of ms1m tsv file. But three images below in your cleaned list isn't in tsv file.

m.0107_f/23.jpg
m.0107_f/47.jpg
m.0107_f/85.jpg

I also checked the other images in id m.0107_f extracted according to your provided cleaned list, but they are rather dirty. Are you sure that the renamed map is correct? Do you really want to publish your ms-1m cleaned list? It's up to your team, but I really hope your team will publish it and attach one detailed specification.

nttstar commented 6 years ago

@cleardusk raw images will not be published. You can just ignore those lines if they're not existing in tsv file. I strongly recommend you to use our framework and the binary dataset I provided as it will give you better results.

zhenglaizhang commented 6 years ago

@nttstar Hi, I also notice that extracting based on the msceleb1m clean list has some noises. Does the model trained on the msceleb1m dataset with such noises? If so, then the performance might improve future with more cleaned dataset then. (Of course, insightface's performance is very good currently -:))

Erdos001 commented 6 years ago

@cleardusk Hi, could you tell me how you download the ms1m tsv file? I searched the website https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/ but I can not find the tsv file that with the second column to be an unique integer for each entity

cleardusk commented 6 years ago

The tsv files of raw images.

links

https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.00.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.01.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.02.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.03.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.04.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.05.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.06.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.07.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.08.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.09.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.10.tsv

md5sum

1fc45da4b7fd7f617673a87f8cab5893 *MsCelebV1-ImageThumbnails.part.00.tsv
8ac422f04132901094e828493af722e5 *MsCelebV1-ImageThumbnails.part.01.tsv
a6fc3e6b25b2168c7a831c6d580f5470 *MsCelebV1-ImageThumbnails.part.02.tsv
b56ae01cc482681e286bb9ab3d16109f *MsCelebV1-ImageThumbnails.part.03.tsv
8469b4a75ecea2e79a07e335e271fa8d *MsCelebV1-ImageThumbnails.part.04.tsv
542898476cfb5906353860fe8493f71b *MsCelebV1-ImageThumbnails.part.05.tsv
b60efefb35c68d8708014bde958f8155 *MsCelebV1-ImageThumbnails.part.06.tsv
327c3db7faff3935fb58e99f234a2d0d *MsCelebV1-ImageThumbnails.part.07.tsv
2af627f24a844d49745f37ee681cf1a5 *MsCelebV1-ImageThumbnails.part.08.tsv
bef1fb4010d1c4a43e57337a58b87f7e *MsCelebV1-ImageThumbnails.part.09.tsv
4b66a8cf3f5585996d2ff8950129bf18 *MsCelebV1-ImageThumbnails.part.10.tsv

The msceleb extractor code reference

Erdos001 commented 6 years ago

@cleardusk Thank you very much!

zhonghp commented 6 years ago

@nttstar @cleardusk how will you do if there are many faces in one image, take the largest one ?

Rigel-1994 commented 6 years ago

@cleardusk How many images are left in your case? And how did you attach the label to each image?

xxllp commented 6 years ago

@cleardusk ,地址打开有报错,是要登陆还是授权啊

yxchng commented 6 years ago

@cleardusk the link no longer works

yxchng commented 6 years ago

Have anyone of you here found the dataset with the second column being unique? Hopefully some of you can post the link here. Data from http://msceleb.org/ is certainly not unique with second column

SergeyMilyaev commented 5 years ago

@nttstar , could you upload this MSCeleb1M clean list to Google Drive?

Honzys commented 5 years ago

Hello, can someone please reupload https://pan.baidu.com/s/1eTn6O62 to Google Drive or some other website? Thank you very much!

beszedes commented 5 years ago

Here is the file on Google Drive. https://drive.google.com/file/d/1lc_6SbIh-xipNPGW8gJpvnQy8yDd3m2U/view?usp=sharing

belgraviton commented 5 years ago

Here is the file on Google Drive. https://drive.google.com/file/d/1lc_6SbIh-xipNPGW8gJpvnQy8yDd3m2U/view?usp=sharing

This is first version of cleaning (85K ids/3.8M images).

@nttstar Could you share the second version of cleaning (85K ids/5.8M images)?

hjy1312 commented 5 years ago

Could you share the file named as MsCelebV1-Faces-Cropped.tsv? The official download link seems to be broken and I couldn't find anywhere to download the file. Thanks a lot. @cleardusk

flyduck commented 5 years ago

@cleardusk 您好,现在原始MS1M数据不能下载了,您分享的那几个链接也不能用了,请问您还有其他下载方式或者您是否有原始数据可以上传网盘吗?非常感谢!

flyduck commented 5 years ago

@HaoLiuHust 您好,请问您可以分享一下MS1M的原始数据么?非常感谢!

ingjieye commented 5 years ago

@HaoLiuHust 您好,请问您可以分享一下MS1M的原始数据么?非常感谢!

Use emule

Hzzone commented 4 years ago

The torrent of the raw ms1m dataset could be downloaded from academictorrents. I found it will be very fast with campus network in China.

quangtn266 commented 3 years ago

Hi, Can anyone share refined_ms1m.txt again?

Thank you very much.