Open lvcc2018 opened 1 year ago
Is nvcc or any other dependencies necessary?
Hi @lvcc2018, It seems that some files were not "hipified" properly. I recommend you to use the following options:
docker pull rocm/pytorch:latest-centos7
which has the latest prebuilt stable PyTorch and Apex. If you would like to make some changes to Apex, feel free to reinstall Apex from source.Hi @lvcc2018, It seems that some files were not "hipified" properly. I recommend you to use the following options:
- (Preferred) Use our published docker image:
docker pull rocm/pytorch:latest-centos7
which has the latest prebuilt stable PyTorch and Apex. If you would like to make some changes to Apex, feel free to reinstall Apex from source.- Uninstall ROCm 4.0.1 and reinstall newer versions of ROCm and their dependencies. Then, build ROCm from source.
Thanks for your time. Agree that it doesn't hipify properly. Unfortunately It's not allowed to use docker or use a newer version of ROCm. I find that all the .cu files are skipped, is it normal?
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/multihead_attn/dropout_hip.cuh -> None ignored
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/multihead_attn/layer_norm_hip.cuh -> None ignored
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/multihead_attn/strided_batched_gemm_hip.cuh -> None ignored
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p_cuda.cuh -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p_cuda.cuh ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p_cuda.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/nccl_p2p/nccl_p2p_hip.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_adam_cuda.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_adam_cuda.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_adam_cuda_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_adam_hip_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_lamb_cuda.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_lamb_cuda.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_lamb_cuda_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/fused_lamb_hip_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_adam.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_adam.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_adam_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_adam_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_lamb.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_lamb.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_lamb_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/optimizers/multi_tensor_distopt_lamb_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_joint_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_joint_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_joint.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_joint.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_loss.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_loss.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_loss_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/transducer/transducer_loss_kernel.hip skipped
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/xentropy/interface.cpp -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/xentropy/interface.cpp ok
/public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/xentropy/xentropy_kernel.cu -> /public/home/ach2ha8oau/megatron-deepspeed/apex-master/apex/contrib/csrc/xentropy/xentropy_kernel.hip skipped
Successfully preprocessed all matching files.
Total number of unsupported CUDA function calls: 0
Which version of pytorch do you have installed?
Which version of pytorch do you have installed?
Its 1.10.1+rocm4.0.1
The following error message occurs when I install the apex from source on my ROCm server(CentOS 7.6).
It seems that it is building 'distributed_adam_cuda' extension.
My envirments: