deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.84k stars 5.35k forks source link

(arcface torch) Partial FC speed is the same when setting sample_rate=0.1 and sample_rate=1 #1622

Open yongxinw opened 3 years ago

yongxinw commented 3 years ago

I have observed that the under either sample rates, the trianing speeds are almost identical. Is changing the config.sample_rate=0.1 really triggering the partial fc operation? or am I missing something?

anxiangsir commented 3 years ago

please set num_classes to 2000000 or 10000000 to compare training speeds.

yongxinw commented 3 years ago

please set num_classes to 2000000 or 10000000 to compare training speeds.

Thanks for your reply. I tried a model with 600000 classes but still didn't observe meaningful speed gains. I will try with 2000000 or 10000000 now.

I am wondering if setting the config.sample_rate=0.1 is the only change required to use Partial FC? Are there any other flags that need to be set in order to actually use the Partial FC?

anxiangsir commented 3 years ago

Yes, only changed sample_rate, In general, partial FC can save a lot of GPU memory and computation in the following cases:

  1. When the number of indetities is greater than 2 millions.
  2. In the model parallel training task with more than 32 GPUs.

You can test 2 millions or more identities training tasks by only changing cfg.num_classes in the config.py file, and the following results can be reproduced.

image

anxiangsir commented 3 years ago

we updated a doc for training speed:

https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/docs/speed_benchmark.md

MyDecember12 commented 3 years ago

hello,have you tried to write npcface or CurricularFace in the form of PFC

we updated a doc for training speed:

https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/docs/speed_benchmark.md

hello,have you tried to write npcface or CurricularFace in the form of PFC