deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.81k stars 5.34k forks source link

Ms1m/asian-celeb/deepglint dataset relationship ? #789

Closed SueeH closed 1 year ago

SueeH commented 5 years ago

I have read some issues, and found some discuss about dataset. ms1m-v1 = ms1m-ibug, include 85k IDS/3.8M images ms1m-v2 = ms1m-arc=emore include 85K IDS/5.8M images I found in arc-loss paper, ms1m-ibug is used,while the pretrained model in open model zoo, ms1m-arc is used.

But sitll have some confusion:

  1. this two datasets have same ids, but why the second one has 2M images more? someone said data augment is used? if really, augment types?

  2. Also DeepGlint(in dataset zoo 181K IDS/6.75 images) is same with ori deepglint website(http://trillionpairs.deepglint.com/data)? or any other operations?

  3. Asian-celeb(94K IDS/2.8M images) has relationship with deepglint? @nttstar

Coderx7 commented 4 years ago

They have gone through different cleaning procedures I guess. i.e. those two datasets are two different versions of the the same datasetm, cleaned differently. There is another version called C-MS-Celeb which you can also use. it has 94K ids and 6.4M images

jetsmith commented 4 years ago

@SueeH the deepglint includes Asian-celeb which was listed on website http://trillionpairs.deepglint.com/data, i plan to merge glint and ms1m-arcface datasets, but i can not map the label in ms1m-arc with glint's ms1m-v1c part. what should i do? @Coderx7

SueeH commented 4 years ago

@SueeH the deepglint includes Asian-celeb which was listed on website http://trillionpairs.deepglint.com/data, i plan to merge glint and ms1m-arcface datasets, but i can not map the label in ms1m-arc with glint's ms1m-v1c part. what should i do? @Coderx7

You may use face recognition to find same id. But I think use one dataset is fine, most of the id is same.

jetsmith commented 4 years ago

@SueeH ok, thanks

SueeH commented 4 years ago

They have gone through different cleaning procedures I guess. i.e. those two datasets are two different versions of the the same datasetm, cleaned differently. There is another version called C-MS-Celeb which you can also use. it has 94K ids and 6.4M images I found the face label is num(like 1,2,3...), so we just know some images belongs to a person, but we don't know who is the person. Anyone know is there a name list for training dataset?

ChengMoumou commented 2 years ago

hi,Do you have any information about the Asian-celeb dataset?(^▽^)

SueeH commented 2 years ago

hi,Do you have any information about the Asian-celeb dataset?(^▽^)

https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_ Did you check this?

ChengMoumou commented 2 years ago

嗨,你有关于亚洲名人数据集的任何信息吗?( ^▽^ )

https://github.com/deepinsight/insightface/tree/master/recognition/_datasets _ 你检查了吗?

Thank you. I'll go see now.(^▽^)