NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.41k stars 1.4k forks source link

installing Apex on Jetson Xavier #718

Closed AndreV84 closed 4 years ago

AndreV84 commented 4 years ago

Will Apex pytorch extension work on Xavier? is there any simple sample to confirm the installation? Thanks

AndreV84 commented 4 years ago

`pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip. Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue. To avoid this problem you can invoke Python with '-m pip' instead of running pip directly. /home/nvidia/.local/lib/python3.6/site-packages/pip/_internal/commands/install.py:244: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option. cmdoptions.check_install_build_global(options) Defaulting to user installation because normal site-packages is not writeable Created temporary directory: /tmp/pip-ephem-wheel-cache-jm37ycno Created temporary directory: /tmp/pip-req-tracker-qoxrrmv7 Initialized build tracking at /tmp/pip-req-tracker-qoxrrmv7 Created build tracker: /tmp/pip-req-tracker-qoxrrmv7 Entered build tracker: /tmp/pip-req-tracker-qoxrrmv7 Created temporary directory: /tmp/pip-install-88_lp8td Processing /home/nvidia/apex Created temporary directory: /tmp/pip-req-build-luydcqp0 Added file:///home/nvidia/apex to build tracker '/tmp/pip-req-tracker-qoxrrmv7' Running setup.py (path:/tmp/pip-req-build-luydcqp0/setup.py) egg_info for package from file:///home/nvidia/apex Running command python setup.py egg_info torch.version = 1.4.0 running egg_info creating /tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info writing /tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/PKG-INFO writing dependency_links to /tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/dependency_links.txt writing top-level names to /tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/top_level.txt writing manifest file '/tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/SOURCES.txt' writing manifest file '/tmp/pip-req-build-luydcqp0/pip-egg-info/apex.egg-info/SOURCES.txt' /tmp/pip-req-build-luydcqp0/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!") Source in /tmp/pip-req-build-luydcqp0 has version 0.1, which satisfies requirement apex==0.1 from file:///home/nvidia/apex Removed apex==0.1 from file:///home/nvidia/apex from build tracker '/tmp/pip-req-tracker-qoxrrmv7' Skipping wheel build for apex, due to binaries being disabled for it. Installing collected packages: apex Attempting uninstall: apex Found existing installation: apex 0.1 Uninstalling apex-0.1: Created temporary directory: /tmp/pip-uninstall-nq1rrgft Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/amp_C.cpython-36m-aarch64-linux-gnu.so Created temporary directory: /home/nvidia/.local/lib/python3.6/site-packages/~pex-0.1-py3.6.egg-info Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/apex-0.1-py3.6.egg-info Created temporary directory: /home/nvidia/.local/lib/python3.6/site-packages/~pex Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/apex/ Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/apex_C.cpython-36m-aarch64-linux-gnu.so Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-aarch64-linux-gnu.so Removing file or directory /home/nvidia/.local/lib/python3.6/site-packages/syncbn.cpython-36m-aarch64-linux-gnu.so Successfully uninstalled apex-0.1 Created temporary directory: /tmp/pip-record-zjrirzww Running command /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-luydcqp0/setup.py'"'"'; file='"'"'/tmp/pip-req-build-luydcqp0/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-zjrirzww/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/nvidia/.local/include/python3.6m/apex torch.version = 1.4.0 /tmp/pip-req-build-luydcqp0/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326
from /usr/local/cuda-10.0/bin

running install
running build
running build_py
creating build
creating build/lib.linux-aarch64-3.6
creating build/lib.linux-aarch64-3.6/apex
copying apex/__init__.py -> build/lib.linux-aarch64-3.6/apex
creating build/lib.linux-aarch64-3.6/apex/contrib
copying apex/contrib/__init__.py -> build/lib.linux-aarch64-3.6/apex/contrib
creating build/lib.linux-aarch64-3.6/apex/optimizers
copying apex/optimizers/fused_adam.py -> build/lib.linux-aarch64-3.6/apex/optimizers
copying apex/optimizers/fused_lamb.py -> build/lib.linux-aarch64-3.6/apex/optimizers
copying apex/optimizers/fused_novograd.py -> build/lib.linux-aarch64-3.6/apex/optimizers
copying apex/optimizers/fused_sgd.py -> build/lib.linux-aarch64-3.6/apex/optimizers
copying apex/optimizers/__init__.py -> build/lib.linux-aarch64-3.6/apex/optimizers
creating build/lib.linux-aarch64-3.6/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-aarch64-3.6/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-aarch64-3.6/apex/fp16_utils
copying apex/fp16_utils/__init__.py -> build/lib.linux-aarch64-3.6/apex/fp16_utils
copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-aarch64-3.6/apex/fp16_utils
creating build/lib.linux-aarch64-3.6/apex/RNN
copying apex/RNN/cells.py -> build/lib.linux-aarch64-3.6/apex/RNN
copying apex/RNN/models.py -> build/lib.linux-aarch64-3.6/apex/RNN
copying apex/RNN/RNNBackend.py -> build/lib.linux-aarch64-3.6/apex/RNN
copying apex/RNN/__init__.py -> build/lib.linux-aarch64-3.6/apex/RNN
creating build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/multiproc.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/LARC.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-aarch64-3.6/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-aarch64-3.6/apex/parallel
creating build/lib.linux-aarch64-3.6/apex/normalization
copying apex/normalization/__init__.py -> build/lib.linux-aarch64-3.6/apex/normalization
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-aarch64-3.6/apex/normalization
creating build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/opt.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/_process_optimizer.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/handle.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/amp.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/compat.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/_amp_state.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/utils.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/__version__.py -> build/lib.linux-aarch64-3.6/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-aarch64-3.6/apex/amp
creating build/lib.linux-aarch64-3.6/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib.linux-aarch64-3.6/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib.linux-aarch64-3.6/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib.linux-aarch64-3.6/apex/reparameterization
creating build/lib.linux-aarch64-3.6/apex/pyprof
copying apex/pyprof/__init__.py -> build/lib.linux-aarch64-3.6/apex/pyprof
creating build/lib.linux-aarch64-3.6/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-aarch64-3.6/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-aarch64-3.6/apex/multi_tensor_apply
creating build/lib.linux-aarch64-3.6/apex/contrib/optimizers
copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-aarch64-3.6/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-aarch64-3.6/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_sgd.py -> build/lib.linux-aarch64-3.6/apex/contrib/optimizers
copying apex/contrib/optimizers/__init__.py -> build/lib.linux-aarch64-3.6/apex/contrib/optimizers
creating build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/encdec_multihead_attn.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/self_multihead_attn_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/self_multihead_attn.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/fast_encdec_multihead_attn_norm_add_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/encdec_multihead_attn_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/fast_self_multihead_attn_norm_add_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/fast_self_multihead_attn_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/__init__.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
copying apex/contrib/multihead_attn/fast_encdec_multihead_attn_func.py -> build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn
creating build/lib.linux-aarch64-3.6/apex/contrib/groupbn
copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-aarch64-3.6/apex/contrib/groupbn
copying apex/contrib/groupbn/__init__.py -> build/lib.linux-aarch64-3.6/apex/contrib/groupbn
creating build/lib.linux-aarch64-3.6/apex/contrib/xentropy
copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-aarch64-3.6/apex/contrib/xentropy
copying apex/contrib/xentropy/__init__.py -> build/lib.linux-aarch64-3.6/apex/contrib/xentropy
creating build/lib.linux-aarch64-3.6/apex/amp/lists
copying apex/amp/lists/functional_overrides.py -> build/lib.linux-aarch64-3.6/apex/amp/lists
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-aarch64-3.6/apex/amp/lists
copying apex/amp/lists/__init__.py -> build/lib.linux-aarch64-3.6/apex/amp/lists
copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-aarch64-3.6/apex/amp/lists
creating build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/kernel.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/__main__.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/nvvp.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/db.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/parse.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
copying apex/pyprof/parse/__init__.py -> build/lib.linux-aarch64-3.6/apex/pyprof/parse
creating build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/activation.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/usage.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/loss.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/__main__.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/index_slice_join_mutate.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/embedding.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/base.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/convert.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/randomSample.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/data.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/recurrentCell.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/reduction.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/misc.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/utility.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/conv.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/prof.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/blas.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/dropout.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/normalization.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/linear.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/softmax.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/output.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/__init__.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/pooling.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/optim.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
copying apex/pyprof/prof/pointwise.py -> build/lib.linux-aarch64-3.6/apex/pyprof/prof
creating build/lib.linux-aarch64-3.6/apex/pyprof/nvtx
copying apex/pyprof/nvtx/nvmarker.py -> build/lib.linux-aarch64-3.6/apex/pyprof/nvtx
copying apex/pyprof/nvtx/__init__.py -> build/lib.linux-aarch64-3.6/apex/pyprof/nvtx
running build_ext
building 'apex_C' extension
creating build/temp.linux-aarch64-3.6
creating build/temp.linux-aarch64-3.6/csrc
aarch64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/include/python3.6m -c csrc/flatten_unflatten.cpp -o build/temp.linux-aarch64-3.6/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11
aarch64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-aarch64-3.6/csrc/flatten_unflatten.o -o build/lib.linux-aarch64-3.6/apex_C.cpython-36m-aarch64-linux-gnu.so
building 'amp_C' extension
aarch64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/amp_C_frontend.cpp -o build/temp.linux-aarch64-3.6/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11
/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_l2norm_kernel.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_lamb_stage_1.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb_stage_1.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_lamb_stage_2.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb_stage_2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_adam.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_novograd.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/multi_tensor_lamb.cu -o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

aarch64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-aarch64-3.6/csrc/amp_C_frontend.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_sgd_kernel.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_scale_kernel.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_axpby_kernel.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_l2norm_kernel.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb_stage_1.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb_stage_2.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_adam.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_novograd.o build/temp.linux-aarch64-3.6/csrc/multi_tensor_lamb.o -L/usr/local/cuda-10.0/lib64 -lcudart -o build/lib.linux-aarch64-3.6/amp_C.cpython-36m-aarch64-linux-gnu.so
building 'syncbn' extension
aarch64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/syncbn.cpp -o build/temp.linux-aarch64-3.6/csrc/syncbn.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11
/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/welford.cu -o build/temp.linux-aarch64-3.6/csrc/welford.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

aarch64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-aarch64-3.6/csrc/syncbn.o build/temp.linux-aarch64-3.6/csrc/welford.o -L/usr/local/cuda-10.0/lib64 -lcudart -o build/lib.linux-aarch64-3.6/syncbn.cpython-36m-aarch64-linux-gnu.so
building 'fused_layer_norm_cuda' extension
aarch64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/layer_norm_cuda.cpp -o build/temp.linux-aarch64-3.6/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11
/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/usr/include/python3.6m -c csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-aarch64-3.6/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -maxrregcount=50 -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++11
/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/nvidia/.local/lib/python3.6/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

aarch64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-aarch64-3.6/csrc/layer_norm_cuda.o build/temp.linux-aarch64-3.6/csrc/layer_norm_cuda_kernel.o -L/usr/local/cuda-10.0/lib64 -lcudart -o build/lib.linux-aarch64-3.6/fused_layer_norm_cuda.cpython-36m-aarch64-linux-gnu.so
running install_lib
copying build/lib.linux-aarch64-3.6/fused_layer_norm_cuda.cpython-36m-aarch64-linux-gnu.so -> /home/nvidia/.local/lib/python3.6/site-packages
copying build/lib.linux-aarch64-3.6/syncbn.cpython-36m-aarch64-linux-gnu.so -> /home/nvidia/.local/lib/python3.6/site-packages
copying build/lib.linux-aarch64-3.6/apex_C.cpython-36m-aarch64-linux-gnu.so -> /home/nvidia/.local/lib/python3.6/site-packages
copying build/lib.linux-aarch64-3.6/amp_C.cpython-36m-aarch64-linux-gnu.so -> /home/nvidia/.local/lib/python3.6/site-packages
creating /home/nvidia/.local/lib/python3.6/site-packages/apex
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers
copying build/lib.linux-aarch64-3.6/apex/contrib/optimizers/fp16_optimizer.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers
copying build/lib.linux-aarch64-3.6/apex/contrib/optimizers/fused_adam.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers
copying build/lib.linux-aarch64-3.6/apex/contrib/optimizers/fused_sgd.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers
copying build/lib.linux-aarch64-3.6/apex/contrib/optimizers/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/encdec_multihead_attn.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/self_multihead_attn_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/self_multihead_attn.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/fast_encdec_multihead_attn_norm_add_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/encdec_multihead_attn_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/fast_self_multihead_attn_norm_add_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/fast_self_multihead_attn_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
copying build/lib.linux-aarch64-3.6/apex/contrib/multihead_attn/fast_encdec_multihead_attn_func.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/groupbn
copying build/lib.linux-aarch64-3.6/apex/contrib/groupbn/batch_norm.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/groupbn
copying build/lib.linux-aarch64-3.6/apex/contrib/groupbn/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/groupbn
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/xentropy
copying build/lib.linux-aarch64-3.6/apex/contrib/xentropy/softmax_xentropy.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/xentropy
copying build/lib.linux-aarch64-3.6/apex/contrib/xentropy/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/xentropy
copying build/lib.linux-aarch64-3.6/apex/contrib/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
copying build/lib.linux-aarch64-3.6/apex/optimizers/fused_adam.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
copying build/lib.linux-aarch64-3.6/apex/optimizers/fused_lamb.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
copying build/lib.linux-aarch64-3.6/apex/optimizers/fused_novograd.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
copying build/lib.linux-aarch64-3.6/apex/optimizers/fused_sgd.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
copying build/lib.linux-aarch64-3.6/apex/optimizers/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils
copying build/lib.linux-aarch64-3.6/apex/fp16_utils/fp16_optimizer.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils
copying build/lib.linux-aarch64-3.6/apex/fp16_utils/fp16util.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils
copying build/lib.linux-aarch64-3.6/apex/fp16_utils/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils
copying build/lib.linux-aarch64-3.6/apex/fp16_utils/loss_scaler.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN
copying build/lib.linux-aarch64-3.6/apex/RNN/cells.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN
copying build/lib.linux-aarch64-3.6/apex/RNN/models.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN
copying build/lib.linux-aarch64-3.6/apex/RNN/RNNBackend.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN
copying build/lib.linux-aarch64-3.6/apex/RNN/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/multiproc.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/distributed.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/sync_batchnorm_kernel.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/optimized_sync_batchnorm.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/sync_batchnorm.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/LARC.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
copying build/lib.linux-aarch64-3.6/apex/parallel/optimized_sync_batchnorm_kernel.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/normalization
copying build/lib.linux-aarch64-3.6/apex/normalization/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/normalization
copying build/lib.linux-aarch64-3.6/apex/normalization/fused_layer_norm.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/normalization
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/opt.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/_initialize.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/_process_optimizer.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/handle.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/amp.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/compat.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/frontend.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/_amp_state.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/wrap.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists
copying build/lib.linux-aarch64-3.6/apex/amp/lists/functional_overrides.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists
copying build/lib.linux-aarch64-3.6/apex/amp/lists/torch_overrides.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists
copying build/lib.linux-aarch64-3.6/apex/amp/lists/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists
copying build/lib.linux-aarch64-3.6/apex/amp/lists/tensor_overrides.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists
copying build/lib.linux-aarch64-3.6/apex/amp/utils.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/rnn_compat.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/__version__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
copying build/lib.linux-aarch64-3.6/apex/amp/scaler.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/amp
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization
copying build/lib.linux-aarch64-3.6/apex/reparameterization/reparameterization.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization
copying build/lib.linux-aarch64-3.6/apex/reparameterization/weight_norm.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization
copying build/lib.linux-aarch64-3.6/apex/reparameterization/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization
copying build/lib.linux-aarch64-3.6/apex/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/kernel.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/__main__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/nvvp.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/db.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/parse.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
copying build/lib.linux-aarch64-3.6/apex/pyprof/parse/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/activation.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/usage.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/loss.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/__main__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/index_slice_join_mutate.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/embedding.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/base.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/convert.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/randomSample.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/data.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/recurrentCell.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/reduction.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/misc.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/utility.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/conv.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/prof.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/blas.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/dropout.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/normalization.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/linear.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/softmax.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/output.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/pooling.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/optim.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
copying build/lib.linux-aarch64-3.6/apex/pyprof/prof/pointwise.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/nvtx
copying build/lib.linux-aarch64-3.6/apex/pyprof/nvtx/nvmarker.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/nvtx
copying build/lib.linux-aarch64-3.6/apex/pyprof/nvtx/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/nvtx
copying build/lib.linux-aarch64-3.6/apex/pyprof/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof
creating /home/nvidia/.local/lib/python3.6/site-packages/apex/multi_tensor_apply
copying build/lib.linux-aarch64-3.6/apex/multi_tensor_apply/multi_tensor_apply.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/multi_tensor_apply
copying build/lib.linux-aarch64-3.6/apex/multi_tensor_apply/__init__.py -> /home/nvidia/.local/lib/python3.6/site-packages/apex/multi_tensor_apply
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers/fp16_optimizer.py to fp16_optimizer.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers/fused_adam.py to fused_adam.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers/fused_sgd.py to fused_sgd.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/optimizers/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/encdec_multihead_attn.py to encdec_multihead_attn.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/self_multihead_attn_func.py to self_multihead_attn_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/self_multihead_attn.py to self_multihead_attn.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/fast_encdec_multihead_attn_norm_add_func.py to fast_encdec_multihead_attn_norm_add_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/encdec_multihead_attn_func.py to encdec_multihead_attn_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/fast_self_multihead_attn_norm_add_func.py to fast_self_multihead_attn_norm_add_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/fast_self_multihead_attn_func.py to fast_self_multihead_attn_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/multihead_attn/fast_encdec_multihead_attn_func.py to fast_encdec_multihead_attn_func.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/groupbn/batch_norm.py to batch_norm.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/groupbn/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/xentropy/softmax_xentropy.py to softmax_xentropy.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/xentropy/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/contrib/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers/fused_adam.py to fused_adam.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers/fused_lamb.py to fused_lamb.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers/fused_novograd.py to fused_novograd.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers/fused_sgd.py to fused_sgd.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/optimizers/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils/fp16_optimizer.py to fp16_optimizer.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils/fp16util.py to fp16util.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/fp16_utils/loss_scaler.py to loss_scaler.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN/cells.py to cells.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN/models.py to models.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN/RNNBackend.py to RNNBackend.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/RNN/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/multiproc.py to multiproc.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/distributed.py to distributed.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/sync_batchnorm_kernel.py to sync_batchnorm_kernel.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/optimized_sync_batchnorm.py to optimized_sync_batchnorm.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/sync_batchnorm.py to sync_batchnorm.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/LARC.py to LARC.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/parallel/optimized_sync_batchnorm_kernel.py to optimized_sync_batchnorm_kernel.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/normalization/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py to fused_layer_norm.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/opt.py to opt.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/_initialize.py to _initialize.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/_process_optimizer.py to _process_optimizer.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/handle.py to handle.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/amp.py to amp.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/compat.py to compat.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/frontend.py to frontend.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/_amp_state.py to _amp_state.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/wrap.py to wrap.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists/functional_overrides.py to functional_overrides.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists/torch_overrides.py to torch_overrides.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/lists/tensor_overrides.py to tensor_overrides.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/utils.py to utils.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/rnn_compat.py to rnn_compat.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/__version__.py to __version__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/amp/scaler.py to scaler.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization/reparameterization.py to reparameterization.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization/weight_norm.py to weight_norm.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/reparameterization/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/kernel.py to kernel.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/__main__.py to __main__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/nvvp.py to nvvp.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/db.py to db.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/parse.py to parse.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/parse/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/activation.py to activation.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/usage.py to usage.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/loss.py to loss.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/__main__.py to __main__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/index_slice_join_mutate.py to index_slice_join_mutate.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/embedding.py to embedding.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/base.py to base.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/convert.py to convert.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/randomSample.py to randomSample.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/data.py to data.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/recurrentCell.py to recurrentCell.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/reduction.py to reduction.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/misc.py to misc.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/utility.py to utility.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/conv.py to conv.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/prof.py to prof.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/blas.py to blas.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/dropout.py to dropout.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/normalization.py to normalization.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/linear.py to linear.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/softmax.py to softmax.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/output.py to output.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/pooling.py to pooling.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/optim.py to optim.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/prof/pointwise.py to pointwise.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/nvtx/nvmarker.py to nvmarker.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/nvtx/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/pyprof/__init__.py to __init__.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/multi_tensor_apply/multi_tensor_apply.py to multi_tensor_apply.cpython-36.pyc
byte-compiling /home/nvidia/.local/lib/python3.6/site-packages/apex/multi_tensor_apply/__init__.py to __init__.cpython-36.pyc
running install_egg_info
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to /home/nvidia/.local/lib/python3.6/site-packages/apex-0.1-py3.6.egg-info
running install_scripts
writing list of installed files to '/tmp/pip-record-zjrirzww/install-record.txt'
Running setup.py install for apex ... done

Successfully installed apex-0.1 Cleaning up... Removing source in /tmp/pip-req-build-luydcqp0 Removed build tracker: '/tmp/pip-req-tracker-qoxrrmv7' 1 location(s) to search for versions of pip:

NaleRaphael commented 4 years ago

It seems that apex is installed successfully on Jetson, because the following messages appear in the log:

Running setup.py install for apex ... done
Successfully installed apex-0.1

Or you can try this snippet to check whether apex is installed with cpp & cuda extension correctly:

# `ModuleNotFoundError: No module named 'amp_C'` indicates that cpp & cuda extension is not installed
$ python -c "import amp_C"
AndreV84 commented 4 years ago

python3 -c "import amp_C" Traceback (most recent call last): File "", line 1, in ImportError: /home/nvidia/.local/lib/python3.6/site-packages/amp_C.cpython-36m-aarch64-linux-gnu.so: undefined symbol: THPVariableClass nvidia@nvidia-desktop:~/NeMo/llvm-project/scipy$ python -c "import amp_C" Traceback (most recent call last): File "", line 1, in ImportError: No module named amp_C nvidia@nvidia-desktop:~/NeMo/llvm-project/scipy$ python3 Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import apex

the import doesn't throw any errors

AndreV84 commented 4 years ago

is there any example that I can run in python after importing the module?

NaleRaphael commented 4 years ago

Oh, that's my bad. You have to import torch before import other c/cpp extension related to pytorch. (see also this https://github.com/NVIDIA/apex/issues/370#issuecomment-505075166) The command to test whether apex cpp extension is installed successfully or not should be:

$ python3 -c "import torch; import amp_C;"
AndreV84 commented 4 years ago

~/NeMo$ python3 Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import nemo import torch import amp_C any simple code to run as an example?

AndreV84 commented 4 years ago

python3 Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import nemo import torch import amp_C import argparse import os from apex import amp

FOR DISTRIBUTED: (can also use torch.nn.parallel.DistributedDataParallel instead)

... from apex.parallel import DistributedDataParallel

parser = argparse.ArgumentParser()

FOR DISTRIBUTED: Parse for the local_rank argument, which will be supplied

... # automatically by torch.distributed.launch. ... parser.add_argument("--local_rank", default=0, type=int) _StoreAction(option_strings=['--local_rank'], dest='local_rank', nargs=None, const=None, default=0, type=<class 'int'>, choices=None, help=None, metavar=None) args = parser.parse_args()

FOR DISTRIBUTED: If we are running under torch.distributed.launch,

... # the 'WORLD_SIZE' environment variable will also be set automatically. ... args.distributed = False if 'WORLD_SIZE' in os.environ: ... args.distributed = int(os.environ['WORLD_SIZE']) > 1 ... if args.distributed: ... # FOR DISTRIBUTED: Set the device according to local_rank. ... torch.cuda.set_device(args.local_rank) ...

FOR DISTRIBUTED: Initialize the backend. torch.distributed.launch will provide

... # environment variables, and requires that you use init_method=env://. ... torch.distributed.init_process_group(backend='nccl', File "", line 3 torch.distributed.init_process_group(backend='nccl', ^ IndentationError: unexpected indent init_method='env://') File "", line 1 init_method='env://') ^ IndentationError: unexpected indent

torch.backends.cudnn.benchmark = True

N, D_in, D_out = 64, 1024, 16

Each process receives its own batch of "fake input data" and "fake target data."

... # The "training loop" in each process just uses this fake batch over and over. ... # https://github.com/NVIDIA/apex/tree/master/examples/imagenet provides a more realistic ... # example of distributed data sampling for both training and validation. ... x = torch.randn(N, D_in, device='cuda')

y = torch.randn(N, D_out, device='cuda')

model = torch.nn.Linear(D_in, D_out).cuda() optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

model, optimizer = amp.initialize(model, optimizer, opt_level="O1") Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic

if args.distributed: ... # FOR DISTRIBUTED: After amp.initialize, wrap the model with ... # apex.parallel.DistributedDataParallel. ... model = DistributedDataParallel(model) ... # torch.nn.parallel.DistributedDataParallel is also fine, with some added args: ... # model = torch.nn.parallel.DistributedDataParallel(model, ... # device_ids=[args.local_rank], ... # output_device=args.local_rank) ... loss_fn = torch.nn.MSELoss()

for t in range(500): ... optimizer.zero_grad() ... y_pred = model(x) ... loss = loss_fn(y_pred, y) ... with amp.scale_loss(loss, optimizer) as scaled_loss: ... scaled_loss.backward() ... optimizer.step() ... if args.local_rank == 0: ... print("final loss = ", loss) ... final loss = tensor(0.2006, device='cuda:0', grad_fn=)

NaleRaphael commented 4 years ago

any simple code to run as an example?

You can try the examples in this repository: https://github.com/NVIDIA/apex/tree/master/examples/imagenet

If you want to run a smaller dataset, MNIST is also a good candidate. https://github.com/pytorch/examples/blob/master/mnist/main.py

But you need to add a few more lines to use apex in that script for MNIST:

# --- at the beginning of the file
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR

from apex import amp    # <- add this line

# --- start from L114
model = Net().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=args.lr)

# try to call `amp.initialize` to train the model in half-precision
model, optimizer = amp.initialize(model, optimizer, opt_level='O1')

scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)
for epoch in range(1, args.epochs + 1):
    train(args, model, device, train_loader, optimizer, epoch)
    test(args, model, device, test_loader)
    scheduler.step()
# ...
AndreV84 commented 4 years ago

Thanks for helping!