ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
17 stars 14 forks source link

Faster build using ninja #95

Closed hubertlu-tw closed 1 year ago

hubertlu-tw commented 1 year ago

Build Apex with the original setup.py (with and without ninja for gfx900;gfx906;gfx908;gfx90a;gfx1030) on a system with the default MAX_JOBS=32 with the following two commands:

1. pip install

time pip install --disable-pip-version-check --global-option="--cpp_ext" --global-option="--cuda_ext" -v . 

2. python setup.py install

time python setup.py install --cpp_ext --cuda_ext

The extensions built in the experiment:

Extension Name Support on ROCm
--cpp_ext Yes
--distributed_adam Yes
--distributed_lamb Yes
--cuda_ext Yes
--permutation_search No
--bnp Yes
--xentropy Yes
--focal_loss Yes
--index_mul_2d Yes
--deprecated_fused_adam Yes
--deprecated_fused_lamb Yes
--fast_layer_norm No
--fmha No
--fast_multihead_attn Yes
--transducer Yes
--fast_bottleneck No
--cudnn_gbn No
--peer_memory Yes
--nccl_p2p Yes
--fused_conv_bias_relu No

Comparison of installation time:

Original

real    44m19.120s
user    42m57.882s
sys     1m44.311s

PR

real    20m31.879s
user    43m34.892s
sys     2m0.980s
hubertlu-tw commented 1 year ago
We can further improve the build time by separating out the newly-enabled extensions which are included in --cuda_ext extension by default as follows: Extension Name
--focal_loss
--index_mul_2d
--transducer
--peer_memory
--nccl_p2p
jithunnair-amd commented 1 year ago

This is great! I see that the CI job were already installing ninja, but with this PR, they use ninja for the build eg: http://ml-ci.amd.com:21096/job/pytorch/job/apex-rocm-pytorch-release/211 : 36min

building 'apex_C' extension
creating /apex/build/temp.linux-x86_64-3.7
creating /apex/build/temp.linux-x86_64-3.7/csrc
Emitting ninja build file /apex/build/temp.linux-x86_64-3.7/build.ninja...

vs

http://ml-ci.amd.com:21096/job/pytorch/job/apex-rocm-pytorch-release/209 : 1hr1min