NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.39k forks source link

Apex installation is stucked in infinite loop with printing warnings #1703

Open GalJakob opened 1 year ago

GalJakob commented 1 year ago

Describe the Bug Hey, I have been trying to install apex in colab but it has the bug as mentioned above in the subject. It looks like it has something with the amp_C extension.

Here is the bug: The bolded lines which say " warning #186-D: pointless comparison of unsigned integer with zero" are just repeating themselves in infinite loop.

Besides that, I have also bolded some other warnings that might hint a problem.

self.initialize_options() running bdist_egg running egg_info creating apex.egg-info writing apex.egg-info/PKG-INFO writing dependency_links to apex.egg-info/dependency_links.txt writing requirements to apex.egg-info/requires.txt writing top-level names to apex.egg-info/top_level.txt writing manifest file 'apex.egg-info/SOURCES.txt' /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'apex.egg-info/SOURCES.txt' adding license file 'LICENSE' writing manifest file 'apex.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/apex copying apex/init.py -> build/lib.linux-x86_64-cpython-310/apex copying apex/_autocast_utils.py -> build/lib.linux-x86_64-cpython-310/apex creating build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/version.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/compat.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/opt.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/amp.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/_process_optimizer.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/init.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/scaler.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/_initialize.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/handle.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/frontend.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/wrap.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/utils.py -> build/lib.linux-x86_64-cpython-310/apex/amp copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-cpython-310/apex/amp creating build/lib.linux-x86_64-cpython-310/apex/normalization copying apex/normalization/init.py -> build/lib.linux-x86_64-cpython-310/apex/normalization copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-cpython-310/apex/normalization creating build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/enums.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/microbatches.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/log_util.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/utils.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/_ucc_util.py -> build/lib.linux-x86_64-cpython-310/apex/transformer copying apex/transformer/parallel_state.py -> build/lib.linux-x86_64-cpython-310/apex/transformer creating build/lib.linux-x86_64-cpython-310/apex/mlp copying apex/mlp/init.py -> build/lib.linux-x86_64-cpython-310/apex/mlp copying apex/mlp/mlp.py -> build/lib.linux-x86_64-cpython-310/apex/mlp creating build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_mixed_precision_lamb.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_sgd.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/init.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_novograd.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_adagrad.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers copying apex/optimizers/fused_lamb.py -> build/lib.linux-x86_64-cpython-310/apex/optimizers creating build/lib.linux-x86_64-cpython-310/apex/fp16_utils copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-cpython-310/apex/fp16_utils copying apex/fp16_utils/init.py -> build/lib.linux-x86_64-cpython-310/apex/fp16_utils copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-cpython-310/apex/fp16_utils copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-x86_64-cpython-310/apex/fp16_utils creating build/lib.linux-x86_64-cpython-310/apex/multi_tensor_apply copying apex/multi_tensor_apply/init.py -> build/lib.linux-x86_64-cpython-310/apex/multi_tensor_apply copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-cpython-310/apex/multi_tensor_apply creating build/lib.linux-x86_64-cpython-310/apex/RNN copying apex/RNN/cells.py -> build/lib.linux-x86_64-cpython-310/apex/RNN copying apex/RNN/init.py -> build/lib.linux-x86_64-cpython-310/apex/RNN copying apex/RNN/models.py -> build/lib.linux-x86_64-cpython-310/apex/RNN copying apex/RNN/RNNBackend.py -> build/lib.linux-x86_64-cpython-310/apex/RNN creating build/lib.linux-x86_64-cpython-310/apex/contrib copying apex/contrib/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib creating build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/distributed.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/init.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/LARC.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-x86_64-cpython-310/apex/parallel copying apex/parallel/multiproc.py -> build/lib.linux-x86_64-cpython-310/apex/parallel creating build/lib.linux-x86_64-cpython-310/apex/fused_dense copying apex/fused_dense/init.py -> build/lib.linux-x86_64-cpython-310/apex/fused_dense copying apex/fused_dense/fused_dense.py -> build/lib.linux-x86_64-cpython-310/apex/fused_dense creating build/lib.linux-x86_64-cpython-310/apex/amp/lists copying apex/amp/lists/functional_overrides.py -> build/lib.linux-x86_64-cpython-310/apex/amp/lists copying apex/amp/lists/init.py -> build/lib.linux-x86_64-cpython-310/apex/amp/lists copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-x86_64-cpython-310/apex/amp/lists copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-cpython-310/apex/amp/lists creating build/lib.linux-x86_64-cpython-310/apex/transformer/amp copying apex/transformer/amp/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/amp copying apex/transformer/amp/grad_scaler.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/amp creating build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel copying apex/transformer/pipeline_parallel/p2p_communication.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel copying apex/transformer/pipeline_parallel/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel copying apex/transformer/pipeline_parallel/utils.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel copying apex/transformer/pipeline_parallel/_timers.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel creating build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/global_vars.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/standalone_transformer_lm.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/distributed_test_base.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/arguments.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/commons.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/standalone_bert.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing copying apex/transformer/testing/standalone_gpt.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/testing creating build/lib.linux-x86_64-cpython-310/apex/transformer/layers copying apex/transformer/layers/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/layers copying apex/transformer/layers/layer_norm.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/layers creating build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/memory.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/random.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/mappings.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/layers.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/data.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel copying apex/transformer/tensor_parallel/utils.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/tensor_parallel creating build/lib.linux-x86_64-cpython-310/apex/transformer/_data copying apex/transformer/_data/_batchsampler.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/_data copying apex/transformer/_data/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/_data creating build/lib.linux-x86_64-cpython-310/apex/transformer/functional copying apex/transformer/functional/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/functional copying apex/transformer/functional/fused_softmax.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/functional creating build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules copying apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_with_interleaving.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules copying apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules copying apex/transformer/pipeline_parallel/schedules/init.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules copying apex/transformer/pipeline_parallel/schedules/common.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules copying apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_without_interleaving.py -> build/lib.linux-x86_64-cpython-310/apex/transformer/pipeline_parallel/schedules creating build/lib.linux-x86_64-cpython-310/apex/contrib/conv_bias_relu copying apex/contrib/conv_bias_relu/conv_bias_relu.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/conv_bias_relu copying apex/contrib/conv_bias_relu/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/conv_bias_relu creating build/lib.linux-x86_64-cpython-310/apex/contrib/clip_grad copying apex/contrib/clip_grad/clip_grad.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/clip_grad copying apex/contrib/clip_grad/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/clip_grad creating build/lib.linux-x86_64-cpython-310/apex/contrib/peer_memory copying apex/contrib/peer_memory/peer_halo_exchanger_1d.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/peer_memory copying apex/contrib/peer_memory/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/peer_memory copying apex/contrib/peer_memory/peer_memory.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/peer_memory creating build/lib.linux-x86_64-cpython-310/apex/contrib/index_mul_2d copying apex/contrib/index_mul_2d/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/index_mul_2d copying apex/contrib/index_mul_2d/index_mul_2d.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/index_mul_2d creating build/lib.linux-x86_64-cpython-310/apex/contrib/test copying apex/contrib/test/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test creating build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity copying apex/contrib/sparsity/sparse_masklib.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity copying apex/contrib/sparsity/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity copying apex/contrib/sparsity/permutation_lib.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity copying apex/contrib/sparsity/asp.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity creating build/lib.linux-x86_64-cpython-310/apex/contrib/focal_loss copying apex/contrib/focal_loss/focal_loss.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/focal_loss copying apex/contrib/focal_loss/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/focal_loss creating build/lib.linux-x86_64-cpython-310/apex/contrib/cudnn_gbn copying apex/contrib/cudnn_gbn/batch_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/cudnn_gbn copying apex/contrib/cudnn_gbn/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/cudnn_gbn creating build/lib.linux-x86_64-cpython-310/apex/contrib/fmha copying apex/contrib/fmha/fmha.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/fmha copying apex/contrib/fmha/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/fmha creating build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/self_multihead_attn_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/encdec_multihead_attn_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/self_multihead_attn.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/fast_encdec_multihead_attn_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/fast_self_multihead_attn_norm_add_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/mask_softmax_dropout_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/fast_self_multihead_attn_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/fast_encdec_multihead_attn_norm_add_func.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn copying apex/contrib/multihead_attn/encdec_multihead_attn.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/multihead_attn creating build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/fused_sgd.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/distributed_fused_lamb.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/fused_lamb.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers copying apex/contrib/optimizers/distributed_fused_adam.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/optimizers creating build/lib.linux-x86_64-cpython-310/apex/contrib/layer_norm copying apex/contrib/layer_norm/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/layer_norm copying apex/contrib/layer_norm/layer_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/layer_norm creating build/lib.linux-x86_64-cpython-310/apex/contrib/groupbn copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/groupbn copying apex/contrib/groupbn/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/groupbn creating build/lib.linux-x86_64-cpython-310/apex/contrib/group_norm copying apex/contrib/group_norm/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/group_norm copying apex/contrib/group_norm/group_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/group_norm creating build/lib.linux-x86_64-cpython-310/apex/contrib/bottleneck copying apex/contrib/bottleneck/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/bottleneck copying apex/contrib/bottleneck/bottleneck.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/bottleneck copying apex/contrib/bottleneck/halo_exchangers.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/bottleneck copying apex/contrib/bottleneck/test.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/bottleneck creating build/lib.linux-x86_64-cpython-310/apex/contrib/xentropy copying apex/contrib/xentropy/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/xentropy copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/xentropy creating build/lib.linux-x86_64-cpython-310/apex/contrib/transducer copying apex/contrib/transducer/transducer.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/transducer copying apex/contrib/transducer/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/transducer copying apex/contrib/transducer/_transducer_ref.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/transducer creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/conv_bias_relu copying apex/contrib/test/conv_bias_relu/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/conv_bias_relu copying apex/contrib/test/conv_bias_relu/test_conv_bias_relu.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/conv_bias_relu creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/clip_grad copying apex/contrib/test/clip_grad/test_clip_grad.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/clip_grad copying apex/contrib/test/clip_grad/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/clip_grad creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/peer_memory copying apex/contrib/test/peer_memory/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/peer_memory copying apex/contrib/test/peer_memory/test_peer_halo_exchange_module.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/peer_memory creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/index_mul_2d copying apex/contrib/test/index_mul_2d/test_index_mul_2d.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/index_mul_2d copying apex/contrib/test/index_mul_2d/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/index_mul_2d creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/focal_loss copying apex/contrib/test/focal_loss/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/focal_loss copying apex/contrib/test/focal_loss/test_focal_loss.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/focal_loss creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/cudnn_gbn copying apex/contrib/test/cudnn_gbn/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/cudnn_gbn copying apex/contrib/test/cudnn_gbn/test_cudnn_gbn_with_two_gpus.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/cudnn_gbn creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/fmha copying apex/contrib/test/fmha/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/fmha copying apex/contrib/test/fmha/test_fmha.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/fmha creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_fast_self_multihead_attn_bias.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_self_multihead_attn.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_self_multihead_attn_norm_add.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_mha_fused_softmax.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_encdec_multihead_attn.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn copying apex/contrib/test/multihead_attn/test_encdec_multihead_attn_norm_add.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/multihead_attn creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/optimizers copying apex/contrib/test/optimizers/test_distributed_fused_lamb.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/optimizers copying apex/contrib/test/optimizers/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/optimizers copying apex/contrib/test/optimizers/test_dist_adam.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/optimizers creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/layer_norm copying apex/contrib/test/layer_norm/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/layer_norm copying apex/contrib/test/layer_norm/test_fast_layer_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/layer_norm creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/group_norm copying apex/contrib/test/group_norm/test_group_norm.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/group_norm copying apex/contrib/test/group_norm/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/group_norm creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/bottleneck copying apex/contrib/test/bottleneck/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/bottleneck copying apex/contrib/test/bottleneck/test_bottleneck_module.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/bottleneck creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/xentropy copying apex/contrib/test/xentropy/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/xentropy copying apex/contrib/test/xentropy/test_label_smoothing.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/xentropy creating build/lib.linux-x86_64-cpython-310/apex/contrib/test/transducer copying apex/contrib/test/transducer/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/transducer copying apex/contrib/test/transducer/test_transducer_joint.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/transducer copying apex/contrib/test/transducer/test_transducer_loss.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/test/transducer creating build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels copying apex/contrib/sparsity/permutation_search_kernels/call_permutation_search_kernels.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels copying apex/contrib/sparsity/permutation_search_kernels/init.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels copying apex/contrib/sparsity/permutation_search_kernels/exhaustive_search.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels copying apex/contrib/sparsity/permutation_search_kernels/permutation_utilities.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels copying apex/contrib/sparsity/permutation_search_kernels/channel_swap.py -> build/lib.linux-x86_64-cpython-310/apex/contrib/sparsity/permutation_search_kernels running build_ext

building 'apex_C' extension creating build/temp.linux-x86_64-cpython-310 creating build/temp.linux-x86_64-cpython-310/csrc x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/include/python3.10 -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-cpython-310/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 x86_64-linux-gnu-g++ -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 build/temp.linux-x86_64-cpython-310/csrc/flatten_unflatten.o -L/usr/local/lib/python3.10/dist-packages/torch/lib -L/usr/lib/x86_64-linux-gnu -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/apex_C.cpython-310-x86_64-linux-gnu.so

/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 11.8 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')

building 'amp_C' extension

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here

Minimal Steps/Code to Reproduce the Bug This is the code:

!git clone https://github.com/NVIDIA/apex %cd apex !python3 setup.py install --cuda_ext --cpp_ext

_**Environment**_ Collecting environment information... PyTorch version: 2.0.1+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.25.2 Libc version: glibc-2.35 Python version: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] (64-bit runtime) Python platform: Linux-5.15.109+-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 525.105.17 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.30GHz CPU family: 6 Model: 63 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4599.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB (1 instance) L1i cache: 32 KiB (1 instance) L2 cache: 256 KiB (1 instance) L3 cache: 45 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==2.0.1+cu118 [pip3] torchaudio==2.0.2+cu118 [pip3] torchdata==0.6.1 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.15.2 [pip3] torchvision==0.15.2+cu118 [conda] Could not collect Thank you very much in advance!
huizhg commented 3 months ago

Hi,

I am having the same issue. Did you get this fixed in the end?