PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.66k stars 5.44k forks source link

Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) #22957

Closed seiriosPlus closed 3 years ago

seiriosPlus commented 4 years ago

PR types

New features, Function optimization

PR changes

APIs, OPs, Others

Describe

fully optimized code for parameter server training.

Introduction


API changes:

  1. add fluid.contrib.sparse_embedding for large sparse embedding.
  2. other API changes caused by code formatter.

OP changes:

  1. recv_save add attribute is_sparse
  2. send remove unused attribute send
  3. checkpoint_notify remove attributes trainer_id/dir/lookup_table/epmap
  4. checkpoint_notify add attributes is_slice/varname/remote_varnames/endpoints/slice_varnames/dirname
  5. distributed_lookup_table delete unused attribute height_sections

explain: recv_save / send / checkpoint_notify / distributed_lookup_table are all private ops for distributed training, user-friendly.

Transpiler

old: huge methods with if/for loop new: passes

trainer: delete_optimizer_pass->distributed_ops_pass->append_send_ops_pass->fake_init_ops_pass->init_from_server_pass->delet_extra_optimizes_pass pserver: add_listen_and_serv_pass->add_rpc_global_flags_pass->add_optimizer_pass->large_scale_sparse_pass->build_pserver_startup_program_pass->large_scale_sparse_pass

Communicator

reimplement: Communicator -> AsyncComunicator -> GeoCommunicator Communicator -> AsyncComunicator -> HalfAsyncCommunicator -> SyncCommunicator

Server

add LargeScaleKV implement

  1. auto growth id
  2. id in [0, INT64]
  3. hash by pservers, fix hotspot issues.
  4. save to SelectedRows/Text
  5. PServer Save

Experiments

CTR LARGE_SCALE VS 1.7.2(8) SPEED lines/esc TEST AUC
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 40382.7114 41824.4988 0.743417 0.747783
1 40852.5408 42742.1863 0.763407 0.765855
2 41757.7742 42824.1947 0.773173 0.775051
3 42245.0450 41419.2507 0.779447 0.780185
4 42255.9334 43046.1868 0.782906 0.783628
5 41638.7534 43064.7863 0.785447 0.785996
6 41625.8496 43903.9036 0.787759 0.787912
7 41826.0234 43849.0151 0.789067 0.789307
8 41644.1003 43084.6209 0.790005 0.790547
9 42006.6102 45298.1854 0.791332 0.791442
10 41520.9987 44090.6176 0.791979 0.792363
11 41644.4256 44754.7476 0.79242 0.792934
12 41600.1201 44129.0481 0.792934 0.793618
13 42242.2510 43911.7627 0.793653 0.793995
14 42493.9051 43280.5678 0.793839 0.794289
15 42483.4592 43156.0943 0.794197 0.794641
16 41973.2283 43050.7187 0.794247 0.794467
17 41648.6511 43470.0413 0.794318 0.795061
18 42001.9489 43878.2971 0.794401 0.794801
19 43204.9721 44350.7521 0.794796 0.794991
W2V LARGE_SCALE VS 1.7.2(8) SPEED words/esc   TEST ACC
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 38710.8308 52546.4205 0.357 0.291
1 31047.3975 53228.6121 0.488 0.421
2 38398.9667 53219.9925 0.552 0.483
3 30892.7404 52935.8162 0.592 0.53
4 38253.3532 53083.7943 0.609 0.57
5 30509.0984 52890.2298 0.614 0.583
6 36201.4896 52968.6198 0.621 0.6
7 31285.0067 53509.7486 0.628 0.608
8 34499.2948 52988.9057 0.632 0.615
9 34003.5555 53091.7894 0.634 0.62
10 32231.5232 52983.9373 0.637 0.623
11 36732.4116 52930.2642 0.64 0.63
12 29601.1020 52820.1654 0.639 0.632
13 47984.6335 53022.2126 0.64 0.634
14 44598.9333 53156.1390 0.641 0.635
SIMNET LARGE_SCALE VS 1.7.2(8) SPEED lines/esc   TEST PN
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 80882.7005 69527.7479 1.80131 1.83487
2 80851.1122 69427.2556 1.92519 1.95693
4 81212.7949 69551.4940 1.99275 2.01557
6 81046.8461 69718.6492 2.02755 2.065
8 81191.8261 69810.9468 2.0512 2.07834
10 81614.7143 69974.7245 2.08456 2.11172
12 80365.6785 69782.9580 2.06726 2.12081
14 80845.8114 70085.8516 2.1108 2.1329
16 81453.6759 70270.9776 2.12224 2.11487
18 80598.0124 70201.1548 2.14547 2.16318
20 80936.1689 69586.5245 2.12098 2.17401
22 80281.5667 69935.3969 2.16638 2.17755
24 80648.5904 69712.8900 2.17197 2.18454
26 80175.3672 69874.0455 2.17442 2.18979
28 80482.3953 69670.9059 2.1805 2.19599
30 80357.8911 69693.6987 2.18897 2.20132
32 80576.7476 69591.5745 2.19511 2.20476
34 81010.0043 69194.1689 2.19904 2.21155
36 80852.6387 69869.2369 2.19907 2.20889
38 80696.7542 69573.0540 2.2062 2.20458

NEXT Work

  1. the full unification of Tensor and LargeScakeLV on PServer
  2. Speed up for Geo/Async.
  3. more test at business situations.
paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

paddle-bot-old[bot] commented 3 years ago

Thanks for your contribution! Please add test = develop in your commit message to trigger CI to ensure your PR can be merged. See Paddle CI Manual for details.

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.