Closed HaoLiuHust closed 6 years ago
thank you very much
will cleaned megaface release?
Yes
is your image address has another logic? the format is "./m.0933t2/0.jpg" in your ms1m_clean_list.txt file, but the source imge address is like "./MS-Celeb-1M/img/m.0933t2/0-FaceId-0.jpg"?
check MS1M tsv file and use the 2nd column as image id.
your use the 2nd column as image id
I am still confused, how do you map the original images named like "m.0933t2/0-FaceId-0.jpg" to "m.0933t2/0.jpg" (this map may be not right)
the second column of ms1m tsv file is a single unique integer for each identity. Do not use the filename like "m.0933t2/0-FaceId-0.jpg" in our case.
the second column of ms1m tsv file is a single unique integer for each identity.
In this case, where is the image data?
Take one mid m.0933t2
as example
m.0933t2 "Misty Lee"@en
m.0933t2 "Misty Lee"@pl
downloaded filename: FaceImageCroppedWithOutAlignment.tsv
the first line: m.0107_f 0 http://getbeatmadrid.files.wordpress.com/2013/01/magic-alex.jpg .......
images were base64 encoded in this large text file.
Great thanks, I got it. But FaceImageCroppedWithOutAlignment.tsv in https://msceleb.blob.core.windows.net/ms-celeb-v1-split?restype=container&comp=list
is invalid now, could you please provide one download link?
@cleardusk seems the file is split into 8 parts, you can find it on msceleb website
@HaoLiuHust I explored
http://www.msceleb.org/
and
https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/
but I didn't find 8 parts raw tsv file.
https://msceleb.blob.core.windows.net/ms-celeb-v1-cropped-split?restype=container&comp=list you can find them here, however, seems the file is not as much as the author said
Will you report the performance only with your cleaned ms-1m list? I think it is more reasonable and comparable in the academic community (it's fine to report the performance using any data on MegaFace Website).
@cleardusk Different clean lists are also not comparable. We can fix the dataset and compare the relative performance for different networks/losses.
Different clean lists are also not comparable
That's right, I just wonder how high the performance will be without using extra data.
Additionally, InsightFace paper writes
As is well known that the above mentioned three attributes, data, network and loss, have a high-to-low influence on the performance of face recognition models.
The reviewers may want to see some experiments on this conclusion. Such as the performance differences using un-cleared and cleaned ms-1m list
@cleardusk Right, and we're doing such experiments :) Thank you for your good advice.
Can you please share me your private data I need it for my research in order to study the impact of any additional data on MS?! Thanks.
after extract image with your logic, but ./m.0933t2/16.jpg is not exist in ms-1m, is this in your private data? or there is another extra data in ms-1m?
@vzvzx How many images do not exist in ms-1m? I also plan to extract images according to this cleaned list.
@nttstar I checked the id m.0107_f
, the image id is named by the second column of ms1m tsv file. But three images below in your cleaned list isn't in tsv file.
m.0107_f/23.jpg
m.0107_f/47.jpg
m.0107_f/85.jpg
I also checked the other images in id m.0107_f
extracted according to your provided cleaned list, but they are rather dirty.
Are you sure that the renamed map is correct?
Do you really want to publish your ms-1m cleaned list? It's up to your team, but I really hope your team will publish it and attach one detailed specification.
@cleardusk raw images will not be published. You can just ignore those lines if they're not existing in tsv file. I strongly recommend you to use our framework and the binary dataset I provided as it will give you better results.
@nttstar Hi, I also notice that extracting based on the msceleb1m clean list has some noises. Does the model trained on the msceleb1m dataset with such noises? If so, then the performance might improve future with more cleaned dataset then. (Of course, insightface's performance is very good currently -:))
@cleardusk Hi, could you tell me how you download the ms1m tsv file? I searched the website https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/ but I can not find the tsv file that with the second column to be an unique integer for each entity
The tsv files of raw images.
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.00.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.01.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.02.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.03.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.04.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.05.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.06.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.07.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.08.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.09.tsv
https://msceleb.blob.core.windows.net/ms-celeb-v1-split/MsCelebV1-ImageThumbnails.part.10.tsv
1fc45da4b7fd7f617673a87f8cab5893 *MsCelebV1-ImageThumbnails.part.00.tsv
8ac422f04132901094e828493af722e5 *MsCelebV1-ImageThumbnails.part.01.tsv
a6fc3e6b25b2168c7a831c6d580f5470 *MsCelebV1-ImageThumbnails.part.02.tsv
b56ae01cc482681e286bb9ab3d16109f *MsCelebV1-ImageThumbnails.part.03.tsv
8469b4a75ecea2e79a07e335e271fa8d *MsCelebV1-ImageThumbnails.part.04.tsv
542898476cfb5906353860fe8493f71b *MsCelebV1-ImageThumbnails.part.05.tsv
b60efefb35c68d8708014bde958f8155 *MsCelebV1-ImageThumbnails.part.06.tsv
327c3db7faff3935fb58e99f234a2d0d *MsCelebV1-ImageThumbnails.part.07.tsv
2af627f24a844d49745f37ee681cf1a5 *MsCelebV1-ImageThumbnails.part.08.tsv
bef1fb4010d1c4a43e57337a58b87f7e *MsCelebV1-ImageThumbnails.part.09.tsv
4b66a8cf3f5585996d2ff8950129bf18 *MsCelebV1-ImageThumbnails.part.10.tsv
@cleardusk Thank you very much!
@nttstar @cleardusk how will you do if there are many faces in one image, take the largest one ?
@cleardusk How many images are left in your case? And how did you attach the label to each image?
@cleardusk ,地址打开有报错,是要登陆还是授权啊
@cleardusk the link no longer works
Have anyone of you here found the dataset with the second column being unique? Hopefully some of you can post the link here. Data from http://msceleb.org/ is certainly not unique with second column
@nttstar , could you upload this MSCeleb1M clean list to Google Drive?
Hello, can someone please reupload https://pan.baidu.com/s/1eTn6O62 to Google Drive or some other website? Thank you very much!
Here is the file on Google Drive. https://drive.google.com/file/d/1lc_6SbIh-xipNPGW8gJpvnQy8yDd3m2U/view?usp=sharing
Here is the file on Google Drive. https://drive.google.com/file/d/1lc_6SbIh-xipNPGW8gJpvnQy8yDd3m2U/view?usp=sharing
This is first version of cleaning (85K ids/3.8M images).
@nttstar Could you share the second version of cleaning (85K ids/5.8M images)?
Could you share the file named as MsCelebV1-Faces-Cropped.tsv? The official download link seems to be broken and I couldn't find anywhere to download the file. Thanks a lot. @cleardusk
@cleardusk 您好,现在原始MS1M数据不能下载了,您分享的那几个链接也不能用了,请问您还有其他下载方式或者您是否有原始数据可以上传网盘吗?非常感谢!
@HaoLiuHust 您好,请问您可以分享一下MS1M的原始数据么?非常感谢!
@HaoLiuHust 您好,请问您可以分享一下MS1M的原始数据么?非常感谢!
Use emule
The torrent of the raw ms1m dataset could be downloaded from academictorrents. I found it will be very fast with campus network in China.
Hi, Can anyone share refined_ms1m.txt again?
Thank you very much.
Thank you for your wonderful work, do you have plan to release the clean list, I want to use it with another alignment method