deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
23.38k stars 5.41k forks source link

Asian training dataset(from glint) discussion. #256

Closed nttstar closed 1 year ago

nttstar commented 6 years ago
  1. Download dataset from http://trillionpairs.deepglint.com/data (after signup). msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.
  2. Generate lst file by calling src/data/glint2lst.py. For example:
    python glint2lst.py /data/glint_data msra,celebrity > glint.lst

or generate the asian dataset only by:

python glint2lst.py /data/glint_data celebrity > glint_cn.lst
  1. Call face2rec2.py to generate .rec file.
  2. Merge the dataset with existing one by calling src/data/dataset_merge.py without setting param model which will combine all IDs from those two datasets.

Finally you will get a dataset contains about 180K IDs.

Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.

You can also post your private testing results here.

BUAA-21Li commented 6 years ago

求助:acc只有0.24左右 我是在以前训练的模型下用celebrity进行微调的。以前训练时acc在0.53左右,但是微调训练时acc只有0.22.三个测试集LFW ageDB CF-P和以前相近,请问这该怎么办呢? 命令: CUDA_VISIBLE_DEVICES='1,0' python -u train_softmax.py --network y1 --ckpt 2 --loss-type 4 --lr 0.001 --lr-steps 55000,85000,100000,110000 --wd 0.00004 --fc7-wd-mult 10 --emb-size 128 --per-batch-size 128 --margin-s 128 --data-dir ../datasets/faces_glint_112x112 --pretrained ../models/MobileFaceNet_glint/model-y1-arcface_V2,0042 --prefix ../models/MobileFaceNet_glint/model-y1-arcface_V2

lixiaohui2020 commented 6 years ago

求助,LFW 精度 98.9%,有点低。 我采用 celebrity 数据集从零开始训练,network 采用 mobilenetv2 , Loss 函数采用 arcface 命令: LRSTEPS='32000,48000,56000' CUDA_VISIBLE_DEVICES='4,6' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 256 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 > "$LOGFILE" 2>&1 &

@nttstar LFW 精度低是不是与数据集少而且分布不均衡以及学习率的设置有关系,想问一下这块的超参数如何设置比较合理。

nttstar commented 6 years ago

用正常wd, mobilenetv2的实现可能有问题

lixiaohui2020 commented 6 years ago

@nttstar 没有太理解你的意思,需要再麻烦你一下? 你提到的 正常的 wd 和 wd = 0.00004 有什么区别? mobilenetv2的实现可能有问题是指那一块?我之前也是采用上述参数训练 MS(你在 github 上提供的 rec文件) 数据集训练精度能够达到 99.25%。

Edwardmark commented 6 years ago

Anybody knows how to use dataloader to improve gpu utility?

bigbao9494 commented 6 years ago

msra 数据集在哪儿下载啊?

twmht commented 6 years ago

license?

tinggh commented 6 years ago

@nttstar Which script is used to resize the images in Asia celebrity to 112 by 112? It seems that face2rec2.py has processed it with face_preprocess.preprocess(img,bbox,...). So, we need not to resize these images alone?

nttstar commented 6 years ago

GT of glint-challenge was updated. See http://trillionpairs.deepglint.com/results

cuppersd commented 6 years ago

do you have Face Alignment models?

test4fest commented 6 years ago

@JianbangZ Thanks for reporting these overlaps. Would you please share with us, how do you find overlapping and noise images between MS1M and Asian datasets? Have you done it manually or automatically?

JianbangZ commented 6 years ago

@test4fest automatically + manually. What we did is calculating the embedding clustering center for each identity for each dataset. and then do a center-to-center similarity/distance calculating. Then you can set a threshold to automatically find some overlaps, and use a higher thresh and manually check some unsure ones

test4fest commented 6 years ago

@JianbangZ Is it possible that I use a pre-trained network output to calculate the embedding? Or I have to train a new model based on these combined datasets (MS1M and Asian)?

huohuai commented 6 years ago

MS1M-refine-v2 中各文件夹对应的人名或者mid有吗?比如文件夹0对应m.09zyss之类的对应关系。

jetsmith commented 6 years ago

the dataset doesn't contain face coordinates(left, top, right, bottom)?

hustzeyu commented 6 years ago

Is there any overlap between MS1M and VGGface2 ?

jiankang1991 commented 6 years ago

Has someone successfully trained Mobilefacenet from scratch with DeepGlint dataset? What is the training hyperparameters? Thank you.

jiankang1991 commented 5 years ago

Hi all I would like to try to train mobilefacenet from scratch on DeepGlint dataset. Here is my log example:

INFO:root:Epoch[5] Batch [20]   Speed: 590.55 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [40]   Speed: 565.22 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [60]   Speed: 513.27 samples/sec       acc=0.000000
INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0044.params"
INFO:root:Epoch[5] Batch [80]   Speed: 82.49 samples/sec        acc=0.000000
INFO:root:Epoch[5] Batch [100]  Speed: 504.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [120]  Speed: 522.76 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [140]  Speed: 558.57 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [160]  Speed: 503.59 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [180]  Speed: 545.58 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [200]  Speed: 563.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [220]  Speed: 537.71 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [240]  Speed: 561.69 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [260]  Speed: 551.65 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [280]  Speed: 541.85 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [300]  Speed: 513.12 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [320]  Speed: 535.86 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [340]  Speed: 542.13 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [360]  Speed: 525.81 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [380]  Speed: 536.12 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [400]  Speed: 517.77 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [420]  Speed: 512.70 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [440]  Speed: 554.69 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [460]  Speed: 541.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [480]  Speed: 499.54 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [500]  Speed: 565.82 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [520]  Speed: 490.50 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [540]  Speed: 517.75 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [560]  Speed: 512.61 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [580]  Speed: 532.84 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [600]  Speed: 547.83 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [620]  Speed: 541.03 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [640]  Speed: 523.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [660]  Speed: 566.80 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [680]  Speed: 562.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [700]  Speed: 516.88 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [720]  Speed: 544.09 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [740]  Speed: 555.72 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [760]  Speed: 534.52 samples/sec       acc=0.000000

INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0049.params"
INFO:root:Epoch[5] Batch [10080]        Speed: 84.99 samples/sec        acc=0.000000
INFO:root:Epoch[5] Batch [10100]        Speed: 522.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10120]        Speed: 509.01 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10140]        Speed: 540.22 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10160]        Speed: 520.44 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10180]        Speed: 529.27 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10200]        Speed: 540.42 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10220]        Speed: 559.25 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10240]        Speed: 538.98 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10260]        Speed: 507.30 samples/sec       acc=0.065755
INFO:root:Epoch[5] Batch [10280]        Speed: 548.35 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10300]        Speed: 531.99 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10320]        Speed: 565.28 samples/sec       acc=0.001563
INFO:root:Epoch[5] Batch [10340]        Speed: 522.87 samples/sec       acc=0.000651
INFO:root:Epoch[5] Batch [10360]        Speed: 561.39 samples/sec       acc=0.079557
INFO:root:Epoch[5] Batch [10380]        Speed: 558.66 samples/sec       acc=0.000911
INFO:root:Epoch[5] Batch [10400]        Speed: 567.39 samples/sec       acc=0.053125
INFO:root:Epoch[5] Batch [10420]        Speed: 525.81 samples/sec       acc=0.007552
INFO:root:Epoch[5] Batch [10440]        Speed: 556.13 samples/sec       acc=0.039453
INFO:root:Epoch[5] Batch [10460]        Speed: 539.47 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10480]        Speed: 530.92 samples/sec       acc=0.047786
INFO:root:Epoch[5] Batch [10500]        Speed: 543.45 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10520]        Speed: 551.35 samples/sec       acc=0.001172
INFO:root:Epoch[5] Batch [10540]        Speed: 545.21 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10560]        Speed: 570.32 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10580]        Speed: 552.34 samples/sec       acc=0.012109
INFO:root:Epoch[5] Batch [10600]        Speed: 551.80 samples/sec       acc=0.004297
INFO:root:Epoch[5] Batch [10620]        Speed: 528.08 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10640]        Speed: 544.59 samples/sec       acc=0.150521
INFO:root:Epoch[5] Batch [10660]        Speed: 527.51 samples/sec       acc=0.029948
INFO:root:Epoch[5] Batch [10680]        Speed: 543.34 samples/sec       acc=0.038932
INFO:root:Epoch[5] Batch [10700]        Speed: 527.42 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10720]        Speed: 561.84 samples/sec       acc=0.050651
INFO:root:Epoch[5] Batch [10740]        Speed: 543.47 samples/sec       acc=0.007422
INFO:root:Epoch[5] Batch [10760]        Speed: 559.44 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10780]        Speed: 536.10 samples/sec       acc=0.100391
INFO:root:Epoch[5] Batch [10800]        Speed: 570.65 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10820]        Speed: 561.06 samples/sec       acc=0.073828
INFO:root:Epoch[5] Batch [10840]        Speed: 567.89 samples/sec       acc=0.053646
INFO:root:Epoch[5] Batch [10860]        Speed: 565.93 samples/sec       acc=0.110937
INFO:root:Epoch[5] Batch [10880]        Speed: 529.71 samples/sec       acc=0.012500
INFO:root:Epoch[5] Batch [10900]        Speed: 499.38 samples/sec       acc=0.001823
INFO:root:Epoch[5] Batch [10920]        Speed: 517.37 samples/sec       acc=0.108464
INFO:root:Epoch[5] Batch [10940]        Speed: 563.39 samples/sec       acc=0.056901
INFO:root:Epoch[5] Batch [10960]        Speed: 546.41 samples/sec       acc=0.103385
INFO:root:Epoch[5] Batch [10980]        Speed: 558.78 samples/sec       acc=0.110286

Before Batch 10280, the acc is always 0, but from 10280 batches it has values. It is strange. Does anyone meet this problem before? Thank you.

oukohou commented 5 years ago

@karlTUM Training from scrath~~~ Obviously this means your model finally managed to figure out and learn something. Don'y worry, be happy.

sophiazy commented 5 years ago

@goodpp Hi, would you please sharing your BT torrent or download dataset for me , I find my download file can not parse and unzip successfully, I would appreciate for your help

Thanks! sophia

sophiazy commented 5 years ago

@all 为何我下载的亚洲人脸数据集只能解压出4.1G 这个90+G的.tar.gz文件该怎么处理 我在解压的过程中出现了以下错误:

gzip: stdin: invalid compressed data--format violated tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now

能否指导一下 多谢了,现在不知道官方提供的数据是否正确,还是我自己在下载的时候文件出现了损坏?

songjd commented 5 years ago

@karlTUM I also try to train mobilefacenet from scratch on DeepGlint dataset, but the acc is only about 0.2. Can you help me?

xmuszq commented 5 years ago

Hi,

Where can I get the ELFW dataset (only the ELFW)? The downloaded test dataset has already mess up the ELFW and other Flicker images together. I want the pure ELFW dataset the Deepglint mentioned.

1.ELFW: Face images of celebrities in LFW name list. There are 274k images from 5.7k ids.

felixfuu commented 5 years ago

Are margin-s(64) and margin-m(0.5) suitable for glint dataset (18k ids) ? @nttstar

ckybit commented 5 years ago

Why do I get such low results(Identification is only 0.01270) on TrillionTairs of Glint? Maybe I did not generate the correct result. I use the code src/eval/gen_glint.pyto get the bin file for submits. But maybe the code can not to ues directly, I modify it as follow: The original code in gen_glint.py:

image_path, label, bbox, landmark, aligned = face_preprocess.parse_lst_line(line)
buffer.append( (image_path, landmark) )

The original code in src/common/face_preprocess.py:

def parse_lst_line(line):
  vec = line.strip().split("\t")
  assert len(vec)>=3
  aligned = int(vec[0])
  image_path = vec[1]
  label = int(vec[2])
  bbox = None
  landmark = None
  #print(vec)
  if len(vec)>3:
    bbox = np.zeros( (4,), dtype=np.int32)
    for i in xrange(3,7):
      bbox[i-3] = int(vec[i])
    landmark = None
    if len(vec)>7:
      _l = []
      for i in xrange(7,17):
        _l.append(float(vec[i]))
      landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, label, bbox, landmark, aligned

I modify the gen_glint.py to:

    image_path, landmark = face_preprocess.parse_lst_line(line)  
    image_path = "/to/my/path/TrillionPairs/testdata/"+line.split(" ")[0]
    buffer.append( (image_path, landmark) ) 

and modify the src/common/face_preprocess.py to:

def parse_lst_line(line):
  vec = line.strip().split(" ")
  assert len(vec)>=2
  image_path = vec[0]
  landmark = None
  #print(vec)
  if len(vec)>2:
    _l = []
    for i in xrange(1,11):
      _l.append(float(vec[i]))
    landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, landmark

My input is:

--input='/to/my/path/TrillionPairs/testdata/testdata_lmk/testdata_lmk.txt'

Because the input testdata_lmk.txt format is:

testdata/00/00/00000d7e95948372025bdaca5a203832.jpg 153.4 180.0 246.6 180.0 196.8 215.8 158.5 278.7 230.6 277.6
testdata/00/00/00000f9f87210c8eb9f5fb488b1171d7.jpg 156.1 180.0 243.9 180.0 207.4 229.2 159.8 262.9 237.4 263.0
testdata/00/00/000010e4c136b77a07eeeea84d84d804.jpg 156.4 180.0 243.6 180.0 201.6 223.0 168.0 264.7 237.7 268.0

So I think that my modify is right, and I got the result size of bin file about 1.8G.

I don't know what's wrong with it, if someone can find my problem or provide available code directly?

Any help will be grateful! @nttstar

you should transfer testdata_lmk.txt as @goodpp said.(becase the author change the format of the landmark) if you dont do that , the align image is wrong, you could save it and check.

zhouwei5113 commented 5 years ago

@nttstar

  1. Is the DeepGlint dataset introduced in https://github.com/deepinsight/insightface/wiki/Dataset-Zoo an already merged set from msra and celebrity mentioned in Trillion Pairs test, right?
  2. I've got 0.984092 on megaface but only 0.43088 on trilllion pairs test (both top-1 identity metric). Training data I used is DeepGlint. When I changed the training data to emore, I can easily get 0.80+ result on trilllion pairs test. Now I am confused about the low score on trillion pairs test when using DeepGlint as training data. Anyone can help me?
  3. Call face2rec2.py to re-generate glint.rec file based on above steps1,2,3. Then I've encountered such a problem "s = self.imgrec.read_idx(0) KeyError: 0" when training. What causes such an error?
zhouwei5113 commented 5 years ago

@Edwardmark 我使用整个数据集训练时,finetune r50,training acc为50%, glint测试结果也只有 16%,好多人都遇到这样的问题,很奇怪,目前我在分别使用ms1m,celebrity训练测试下。

@Edwardmark @yhw-yhw 对于glint官网测试结果很低的问题你们解决了吗?我这megaface测试0.984092,但glint官网测试只有0.43088,感觉不太正常,

meanmee commented 5 years ago

@nttstar I just noticed that IBM had released a very impressive facial image dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/#highlights Will you try it?

Or anyone else want to give it a try?

mlourencoeb commented 5 years ago

Hello @nttstar, thanks for the great job.

I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).

Thanks.

Talgin commented 5 years ago

Hello @nttstar, thanks for the great job.

I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).

Thanks.

Hi @mlourencoeb, Have you managed to merge these two datasets? We are running: python dataset_merge.py --include /home/ti/Downloads/DATASETS/faces_emore,/home/ti/Downloads/DATASETS/faces_glint --output /home/ti/Downloads/DATASETS/merge --model /home/ti/Downloads/insightface/models/model-r100-ii/model,0

But at the end of merging process we get the same property, .idx and .rec files as faces_emore (the same size and content). What could be the problem?

mlourencoeb commented 5 years ago

Hello @Talgin.

I did a script myself for the merging since I would like to manually review some case. There is huge overlap between glint asia and emore.

I also find lots of repeated identities in emore. I am cleaning those as we speak.

Talgin commented 5 years ago

Hello @mlourencoeb,
Thank you for fast reply. I'm confused with datasets... in their paper (@nttstar) they say: "DeepGlint-Face(including MS1M-DeepGlint and Asian-DeepGlint)". So, my questions:

Thank you!

mlourencoeb commented 5 years ago

Hello @Talgin

emore is based on MSCELEB just like non asian component of faces_glint. I would merge emore with asia part only, but I could be wrong.

Talgin commented 5 years ago

@mlourencoeb, Thank you! I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)

HuanJiML commented 5 years ago

@zhouwei5113 have you solved your problem? I got also really low score on trillionpairs.

shiyuanyin commented 5 years ago

@nttstar 作者你好,我想改动一个新的结构,是在SE的地方改动的,有点困惑,mxnet 的symbol,不能直接得到bchw的值, pytorch 的SGE,一个实现架构语句, 对应你提供的模型SE代码位置修改的话,symbol每一层bn3 后边的bchw,我直接得不到,我要mxnet,实现这句话,b, c, h, w = x.size(), x = x.reshape(b * self.groups, -1, h, w) 我对mxnet 不是那么熟悉,不知道作者你有没有好的方式实现这句reshape 我在frestnet.py修改的地方 bn3 = mx.sym.BatchNorm(data=conv2, fix_gamma=False, eps=2e-5, momentum=bn_mom, name=name + '_bn3')

if use_se:

    if usr_sge:
         得到 bn3的 bchw
         然后reshape

下面是对应pytorch 实现

class SpatialGroupEnhance(nn.Module): # 3 2 1 hw is half, 311 is same size def init(self, groups = 64): super(SpatialGroupEnhance, self).init() self.groups = groups self.avg_pool = nn.AdaptiveAvgPool2d(1) self.weight = Parameter(torch.zeros(1, groups, 1, 1)) self.bias = Parameter(torch.ones(1, groups, 1, 1)) self.sig = nn.Sigmoid()

def forward(self, x): # (b, c, h, w)
    b, c, h, w = x.size()
    x = x.view(b * self.groups, -1, h, w)  ##reshape
    xn = x * self.avg_pool(x)  # x * global pooling(h,w change 1)
    xn = xn.sum(dim=1, keepdim=True) #(b,1,h,w)
    t = xn.view(b * self.groups, -1)  
    t = t - t.mean(dim=1, keepdim=True)  
    std = t.std(dim=1, keepdim=True) + 1e-5
    t = t / std  # normalize  -mean/std
    t = t.view(b, self.groups, h, w)
    t = t * self.weight + self.bias
    t = t.view(b * self.groups, 1, h, w)
    x = x * self.sig(t)   #in order to sigmod facter,this is group factor (0-1)
    x = x.view(b, c, h, w) #get to varying degrees of importance,Restoration dimension
    return x
shiyuanyin commented 5 years ago

@nttstar 本身的resnet 50 IR 结构添加SGE模块,预训练模型下载的作者的resnet50 ,glint数据 ,训练测试结果是这样,变化不大,

testing verification.. (12000, 512) infer time 7.123213 [lfw][8000]XNorm: 22.401950 [lfw][8000]Accuracy-Flip: 0.99800+-0.00287 testing verification.. (14000, 512) infer time 8.335358 [cfp_fp][8000]XNorm: 21.203882 [cfp_fp][8000]Accuracy-Flip: 0.95300+-0.01448 testing verification.. (12000, 512) infer time 7.040614 [agedb_30][8000]XNorm: 23.488769 [agedb_30][8000]Accuracy-Flip: 0.98000+-0.00749

SueeH commented 5 years ago

@mlourencoeb, Thank you! I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)

any conclusion about thedataset ? Is face_glint = emore + asian_celeb? Ihave same issue in #789

Talgin commented 5 years ago

Hi @nttstar , We are training on faces_glint + our_custom_dataset... now it's almost 10 days, and the thing I want to answer is why our accuracy is not changing, it is acc=~0.30-0.31. At the beginning loss value started from ~46.6-9 and after 2 days decreased to ~7.2-7.5, and acc was 0.0000 and began to rise, but after 20th epoch it stopped and the results you can see from the picture below. It is now 45th epoch, but nothing changed. Our parameters are: Loss: arcface default.end_epoch = 1000 default.lr = 0.001 default.wd = 0.0005 default.mom = 0.9 default.per_batch_size: 64 default.ckpt = 3

network = r100

We are using 4 Tesla P100 GPU's. You can see the progress from below screenshot: Screenshot from 2019-08-02 16-37-06

@nttstar could you tell us what is the problem? We have merged the datasets according to your instructions with dataset_merge.py and no error happened :)

Talgin commented 5 years ago

Hi @SueeH , Sorry for late reply I think this info is noted in their paper: Screenshot from 2019-08-05 11-27-03

They say that face_glint (DeepGlint-Face) includes MS1M-DeepGlint and Asian-DeepGlint. As far as I know and reading this (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8698884) MS1M-DeepGlint is refined version of MS1M (provided by DeepGlint Corp.) and on http://trillionpairs.deepglint.com/overview they say:

  • MS-Celeb-1M-v1c with 86,876 ids/3,923,399 aligned images cleaned from MS-Celeb-1M dataset. This dataset has been excluded from both LFW and Asian-Celeb.
  • Asian-Celeb 93,979 ids/2,830,146 aligned images. This dataset has been excluded from both LFW and MS-Celeb-1M-v1c.

So, I think that emore (MS1MV2) is another refined version of what is included into faces_glint dataset from MS1M (because MS1M-DeepGlint has 2K more ids than MS1MV2, but less images (3.9M to 5.8M)).

EdwardVincentMa commented 4 years ago
  1. Download dataset from http://trillionpairs.deepglint.com/data (after signup). msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.
  2. Generate lst file by calling src/data/glint2lst.py. For example:
python glint2lst.py /data/glint_data msra,celebrity > glint.lst

or generate the asian dataset only by:

python glint2lst.py /data/glint_data celebrity > glint_cn.lst
  1. Call face2rec2.py to generate .rec file.
  2. Merge the dataset with existing one by calling src/data/dataset_merge.py without setting param model which will combine all IDs from those two datasets.

Finally you will get a dataset contains about 180K IDs.

Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.

You can also post your private testing results here.

兄弟,我也上海的,MobileFaceNet+arcloss训练webface数据集或face-ms1m总是会Nan,不知道你试过没有,即便lr调成0.0001,20几轮后(epoch 等于24的时候)就Nan了。

pake2070 commented 4 years ago

Anyone can share configure training Asian Faces ? thanks

lennonxu0101 commented 4 years ago

we use casia

在 2019年11月23日,13:28,pake2070 notifications@github.com 写道:



Anyone can share configure training Asian Faces ? thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdeepinsight%2Finsightface%2Fissues%2F256%3Femail_source%3Dnotifications%26email_token%3DAN3H756D2VG6SLCL5T2DGNLQVC5ONA5CNFSM4FFA7FK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7OBEQ%23issuecomment-557768850&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877163204&sdata=Bqe8kT%2BnNyhJ9%2BDTYByIMuG7VfQVaqTeU6xrIlz6vEk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAN3H754SY75TUPNCXT3XPL3QVC5ONANCNFSM4FFA7FKQ&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877173209&sdata=UsqN8sgjIZ0yy2oRskQPHdkANtX9NB2Iy3FEiSd8bnM%3D&reserved=0.

pake2070 commented 4 years ago

I did step by step but get error about key image : my configure : CUDA_VISIBLE_DEVICES='0,1' python3 -u src/train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 32 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

but get key error for asian dataset:

image

maywander commented 4 years ago

@Edwardmark I meet the same problem with you. Did you get good results on deepglint at last?

Edwardmark commented 4 years ago

@maywander no, I didn't. At last , I use the emore data instead.

maywander commented 4 years ago

so the models trained from emore perform better on trillionpairs test platform?@Edwardmark

Edwardmark commented 4 years ago

@maywander yes, and I don't know why.

anguoyang commented 4 years ago

能正常生产glint.lst文件,但是调用face2rec.py总出错,请问有人知道怎么设置参数么?谢谢

anguoyang commented 4 years ago

感觉代码有问题