hcc in ROCm 1.8.2RC3 uses huge amount of memory to compile file

iotamudelta commented 6 years ago

For full reproduction: Checkout pytorch, run python3 tools/amd_build/build_pytorch_amd.py and build pytorch.

Compile only file that is problematic (on ubuntu 18.04 and w/ my own paths):

/opt/rocm/hcc/bin/clang-7.0 -cc1 -D__KALMAR_HC__=1 -D__HCC_HC__=1 -D__KALMAR_CPU__=1 -D__HCC_CPU__=1 -triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -main-file-name THCTensorMode.cu -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/jmd/software/pytorch/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/./caffe2_hip_generated_THCTensorMode.cu.gcno -resource-dir /opt/rocm/hcc/lib/clang/7.0.0 -I/opt/rocm/hcc/bin/../include -I/opt/rocm/hcc/bin/../hcc/include -D __HIPCC__ -I /opt/rocm/hcc/include -I /opt/rocm/hip/include/hip/hcc_detail/cuda -I /opt/rocm/hsa/include -I /opt/rocm/hip/include -D HIP_VERSION_MAJOR=1 -D HIP_VERSION_MINOR=5 -D HIP_VERSION_PATCH=18231 -D __HIP_ARCH_GFX900__=1 -D __HIP_PLATFORM_HCC__=1 -D CUDA_HAS_FP16=1 -D __HIP_NO_HALF_OPERATORS__=1 -D __HIP_NO_HALF_CONVERSIONS__=1 -I /opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /opt/rocm/Thrust -I /opt/rocm/Thrust/thrust/system/cuda/detail/cub-hip -I -I/opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /opt/rocm/Thrust -I /opt/rocm/Thrust/thrust/system/cuda/detail/cub-hip -I -I/home/jmd/software/pytorch/build/caffe2/aten/src/TH -I /home/jmd/software/pytorch/aten/src/TH -I /home/jmd/software/pytorch/build/caffe2/aten/src/THC -I /home/jmd/software/pytorch/aten/src/THC -I /home/jmd/software/pytorch/aten/src/THCUNN -I /home/jmd/software/pytorch/aten/src/ATen/cuda -I /home/jmd/software/pytorch/build/caffe2/aten/src/TH -I /home/jmd/software/pytorch/aten/src/TH -I /home/jmd/software/pytorch/aten/src/TH -I /home/jmd/software/pytorch/aten/src/THC -I /home/jmd/software/pytorch/build/caffe2/aten/src/TH -I /home/jmd/software/pytorch/build/caffe2/aten/src/THC -I /home/jmd/software/pytorch/aten/src -I /home/jmd/software/pytorch/build/caffe2/aten/src -I /home/jmd/software/pytorch/build/aten/src -I /home/jmd/software/pytorch/aten/src/THNN -I /home/jmd/software/pytorch/aten/src/THCUNN -I /home/jmd/software/pytorch/aten/src -I /home/jmd/software/pytorch/aten/../third_party/catch/single_include -I /home/jmd/software/pytorch/build/caffe2/aten/src/ATen -I /home/jmd/software/pytorch/aten/src/ATen/.. -I /home/jmd/software/pytorch/build/caffe2/aten/src/ATen -I /home/jmd/software/pytorch/build -I /home/jmd/software/pytorch -I -I/home/jmd/software/pytorch/third_party/protobuf/src -I /usr/include -I /home/jmd/software/pytorch/cmake/../third_party/eigen -I /home/jmd/software/pytorch/cmake/../third_party/pybind11/include -I /opt/rocm/hip/include -I /opt/rocm/hipblas/include -I /opt/rocm/hcsparse/include -I /opt/rocm/hcrng/include -I /opt/rocm/Thrust -I /home/jmd/software/pytorch/third_party/onnx -I /home/jmd/software/pytorch/build/third_party/onnx -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/x86_64-linux-gnu/c++/7.3.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/x86_64-linux-gnu/c++/7.3.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/backward -internal-isystem /usr/local/include -internal-isystem /opt/rocm/hcc/lib/clang/7.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wno-deprecated-register -Wno-macro-redefined -Wno-inconsistent-missing-override -Wno-exceptions -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-unused-command-line-argument -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /home/jmd/software/pytorch/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -famp -fhsa-ext -o /tmp/THCTensorMode-e0a67b.s -x hc-host /home/jmd/software/pytorch/aten/src/THC/THCTensorMode.cu -emit-llvm-bc

Observed behavior: takes more than 10GB memory to compile.

whchung commented 6 years ago

@iotamudelta

it seems simply following the build command doesn't work even after modify path.

$ ./b.sh 
In file included from /home/whchung/pytorch/aten/src/THC/THCTensorMode.cu:1:
/home/whchung/pytorch/aten/src/THC/THC.h:4:10: fatal error: 'THCGeneral.h' file not found
#include "THCGeneral.h"
         ^~~~~~~~~~~~~~
1 error generated.

should python3 tools/amd_build/build_pytorch_amd.py be executed prior?

here's what I see:

$ python3 tools/amd_build/build_pytorch_amd.py
error: patch failed: aten/src/THC/generic/THCTensorRandom.cu:504
error: aten/src/THC/generic/THCTensorRandom.cu: patch does not apply
error: patch failed: aten/src/THCUNN/FeatureLPPooling.cu:193
error: aten/src/THCUNN/FeatureLPPooling.cu: patch does not apply
error: patch failed: aten/src/THC/THCDeviceUtils.cuh:52
error: aten/src/THC/THCDeviceUtils.cuh: patch does not apply
error: patch failed: torch/cuda/__init__.py:123
error: torch/cuda/__init__.py: patch does not apply
Traceback (most recent call last):
  File "/home/whchung/pytorch/tools/amd_build/pyHIPIFY/hipify-python.py", line 36, in <module>
    from enum import Enum
ImportError: No module named enum

whchung commented 6 years ago

since pytorch has even more dependencies, perhaps it'd be a good idea to publish one docker container so the issue can be reproduced more easily?

iotamudelta commented 6 years ago

As you know Michael is working on a docker. Let's wait till that is there.

In general, it is hard to come up w/ a single command out-of-order without compiling in-order once, but after that single command works to reproduce. E.g., the THGeneral.h is created in build/caffe2.

jeffdaily commented 6 years ago

I believe to solve the ImportError: No module named enum you need to pip install enum34.

whchung commented 6 years ago

I actually installed enum34 with pip and pip3, still seeing the error though. Since the issue is annoying but not blocking, I'll wait for a docker container for now.

Jorghi12 commented 6 years ago

Docker Image: @whchung docker image here. Simply run python setup.py install.

The excessive memory usage is due to the temporary storage of the bitcode for the kernels inside the /tmp folder hc-kernel-assemble.

Jorghi12 commented 6 years ago

Here's an example of the file create / delete events that are going on in the /tmp directory.

notify.txt

ROCm / hcc

hcc in ROCm 1.8.2RC3 uses huge amount of memory to compile file #785