Closed Sigura closed 2 years ago
@Sigura Please show output of /opt/rocm/opencl/bin/clinfo | grep gfx
.
@Sigura Another question if why you use COMGR for building kernels. Currently the default build path is offline compilation. Please consult PyTorch guys who work on AMD support.
/cc @sunway513
@Sigura Please show output of
/opt/rocm/opencl/bin/clinfo | grep gfx
.
Name: gfx900:xnack-
@Sigura Another question if why you use COMGR for building kernels. Currently the default build path is offline compilation. Please consult PyTorch guys who work on AMD support.
Could you please help to check my understanding – you are a mean question about how to change the path to MIOpen for PyTorch?
@atamazov I'm open to any questions. I will be glad to help with your great work!
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (3305.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Vega 10 XL/XT [Radeon RX Vega 56/64] Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 64 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1630Mhz Address bits: 64 Max memory allocation: 7287183768 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 16384 Max image 3D height: 16384 Max image 3D depth: 8192 Max samplers within kernel: 26751 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8573157376 Constant buffer size: 7287183768 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 2992216472 Max global variable size: 7287183768 Max global variable preferred total size: 8573157376 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0x7f59a80a4e10 Name: gfx900:xnack- Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 3305.0 (HSA1.1,LC) Profile: FULL_PROFILE Version: OpenCL 2.0 Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
RuntimeError Traceback (most recent call last)in 1 trainer = DefaultTrainer(cfg) 2 trainer.resume_or_load(resume=False) # (resume=False) ----> 3 trainer.train() /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/defaults.py in train(self) 481 OrderedDict of results, if evaluation is enabled. Otherwise None. 482 """ --> 483 super().train(self.start_iter, self.max_iter) 484 if len(self.cfg.TEST.EXPECTED_RESULTS) and comm.is_main_process(): 485 assert hasattr( /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py in train(self, start_iter, max_iter) 147 for self.iter in range(start_iter, max_iter): 148 self.before_step() --> 149 self.run_step() 150 self.after_step() 151 # self.iter == max_iter can be used by `after_train` to /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/defaults.py in run_step(self) 491 def run_step(self): 492 self._trainer.iter = self.iter --> 493 self._trainer.run_step() 494 495 @classmethod /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py in run_step(self) 271 If you want to do something with the losses, you can wrap the model. 272 """ --> 273 loss_dict = self.model(data) 274 if isinstance(loss_dict, torch.Tensor): 275 losses = loss_dict /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], [] /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in forward(self, batched_inputs) 152 gt_instances = None 153 --> 154 features = self.backbone(images.tensor) 155 156 if self.proposal_generator is not None: /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], [] /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/fpn.py in forward(self, x) 124 ["p2", "p3", ..., "p6"]. 125 """ --> 126 bottom_up_features = self.bottom_up(x) 127 results = [] 128 prev_features = self.lateral_convs[0](bottom_up_features[self.in_features[-1]]) /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], [] /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py in forward(self, x) 443 assert x.dim() == 4, f"ResNet takes an input of shape (N, C, H, W). Got {x.shape} instead!" 444 outputs = {} --> 445 x = self.stem(x) 446 if "stem" in self._out_features: 447 outputs["stem"] = x /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], [] /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py in forward(self, x) 354 355 def forward(self, x): --> 356 x = self.conv1(x) 357 x = F.relu_(x) 358 x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1) /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], [] /home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/layers/wrappers.py in forward(self, x) 105 106 x = F.conv2d( --> 107 x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups 108 ) 109 if self.norm is not None: RuntimeError: miopenStatusUnknownError
@Sigura Please expect feedback tomorrow, sorry for delaying.
@Sigura Please also share output of
/opt/rocm/bin/rocminfo | grep gfx
cat /sys/module/amdgpu/version
I need to know the ROCm version that you use. I remember that some versions have problems with the kernel mode driver, and that problems are gfx900-specific.
What happens (most likely): HIP runtime reports to MIOpen that target has SRAMECC feature, and its current setting is OFF. Therefore, MIOpen adds "sramecc-" into the target (device) name for compilation. However, gfx900 does not have SRAMECC feature ane neither "sramecc-" or "sramecc+" must be included. The compilation tool that we use (COMGR library) detects that target name is incorrect and issues an error.
Most likely, the problem can be solved by adding some hack in the library or by using suitable version of ROCm.
Please export MIOPEN_LOG_LEVEL=5
, collect console output of your application, ZIP it and attach here. This would provide me with additional information.
Also I am wondering why COMGR is used instead of offline compiler, which is the default.
Could you please help to check my understanding – you are a mean question about how to change the path to MIOpen for PyTorch?
No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++
which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.
I think you need to discuss with with guys from https://github.com/ROCmSoftwarePlatform/pytorch
/cc @sunway513
I need to know the ROCm version that you use. I remember that some versions have problems with the kernel mode driver, and that problems are gfx900-specific.
ROCm 4.3.1 4.2 sorry, I have got it with pytorch
The default way is offline compiler, i.e.
/opt/rocm/bin/clang++
which is capable to build HIP, OpenCL and assembly kernels.
I will try rebuild it in few days
If you have kernel mode driver from 4.3 or 4.3.1, then most likely you have problems with gfx900.
Hey I just want to report that I came across this exact problem, with VEGA 56. So it does seem to affect quite a lot of people
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:gfx900_56.kdb Performance may degrade
MIOpen(HIP): Error [SetIsaName] 'amd_comgr_action_info_set_isa_name(handle, isa.c_str())' amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-: INVALID_ARGUMENT (2)
MIOpen(HIP): Error [BuildOcl] comgr status = INVALID_ARGUMENT (2)
MIOpen(HIP): Warning [BuildOcl] amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-
MIOpen Error: /MIOpen/src/hipoc/hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl
0%| | 0/1000000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/train.py", line 51, in <module>
main_worker(0, 1, args)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/train.py", line 31, in main_worker
trainer.train()
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/trainer/trainer.py", line 108, in train
pred_img = self.netG(images_masked, masks)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/model/aotgan.py", line 37, in forward
x = self.encoder(x)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: miopenStatusUnknownError
Process finished with exit code 1
what can be a quick/painless option to solve this? Thank you.
I will create a workaround patch for this, this week. Stay tuned.
@ffleader1 Which version of ROCm are you using?
V4.2. I tried 4.3 but apparently it just straightout does not work with Pytorch
@ffleader1 Which version of ROCm are you using?
Well just a head up It also does not work with Ubuntu 18 I was expecting that by installing Ubuntu 18, the problem would be resolved, but apparently not :<
@ffleader1
V4.2. I tried 4.3 but...
If you've installed 4.3 once, then most likely you still have the kernel module from 4.3, which known to have issues with gfx900. I will prepare a patch soon.
Please try the wa-issue-1204
branch or apply the attached patch, re-build, re-install and let me know if this resolves your problem.
No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.
@ffleader1 @Sigura The question to both of you still remains: why the MIOpen used by PyTorch was built with the -DMIOPEN_USE_COMGR=On
CMake option. Do you use PyTorch wheels?
No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.
@ffleader1 @Sigura The question to both of you still remains: why the MIOpen used by PyTorch was built with the
-DMIOPEN_USE_COMGR=On
CMake option. Do you use PyTorch wheels?
It was the version that was published on pytorch homepage. Anyway, I really want to try out your patch, but I am normally a Windows kinda guy, so all this building HIP/MIOpen/Rocm from source seem overwhelming to me. I presume the problem was due to pytorch rocm being build with bad confgig? I guess I will wait for a better release of pytorch rocm then.
PyTorch was built with the
-DMIOPEN_USE_COMGR=On
I think it does not. At this moment, I'm focused on other tasks. I will try to return on the weekend to this issue.
And I used 4.2. For me, it looks like it does not support Vega 10 chip.
I will try to build 4.3 with your patch.
So,
How is this urgency low? This feels like kind of a major problem to me, that should be fixed with high priority. - Albeit not in MIOpen, but in ROCm.
Urgency level reflects a workaround or clear path to solution is available (https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1204#issuecomment-940485505). Internally we are discussion how to fix this issue asap. Please stay tuned.
The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well?
pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/rocm4.3.1/torch_nightly.html
Note the PyTorch whl package is fully "self-contained" for all the ROCm components required to execute the PyTorch library.
@atamazov is it possible to cherry-pick your changes to the 4.3.1 release branch, so the next PyTorch nightly whl can pick up the fix from here:
https://github.com/pytorch/builder/blob/main/common/install_miopen.sh#L100
@Bengt
How is this urgency low? This feels like kind of a major problem to me, that should be fixed with high priority.
We do value opinions of our users (and expect the same from them). The level changed to high upon your request.
@sunway513
is it possible to cherry-pick your changes to the 4.3.1 release branch
If you do not insist, I would better avoid cherry-picking into release/rocm-rel-4.3
, because this violates the release process (for example, the released 4.3.1 MIOpen will not match the tip of the release/rocm-rel-4.3
after that, which is not good). But I can create some other branch on top of that (e.g. release/rocm-rel.4.3.1-for-pytorch). Would this work for you?
But at first we need to find out if this patch resolves the issue. Unfortunately we do not have any feedback from the users who run into it.
@atamazov , We've got the reproducible environment locally, will try to validate your patch and move from there. Cherry-picking hot fixes to release branches seems aligned to what we have been doing for both Tensorflow and PyTorch in upstream.
@sunway513 @atamazov looking forward to a release soon.
Currently there are essentially no consumer cards that can run ROCm; good luck finding a Radeon VII, vega 56/64 is currently broken, RX5xxx/RX6xxx unsupported. I really don't want to give money to the Leather Jacket Man, please.
Also, I'm sure you are aware that it's not a good look when a supposedly supported card does not work for such long stretches of time. I'm not sure what's going on at AMD's side, but I bet that testing could improve dramatically.
I was able to reproduce the reported error on torchvision maskrcnn resent50 inference workload. And now we're in progress re-generating the WHL packages with the MIOpen testing branch to validate further.
In the meanwhile, I can confirm the torchvision resnet50 FP16 model training can execute correctly using the pytorch 4.3.1 nightly whl on Vega64 GPU.
Here's my command for reference:
# Launch the ROCm 4.3.1 base docker container
alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx’
drun rocm/dev-ubuntu-18.04:4.3.1
# Pull and install the PyTorch and torchvision 4.3.1 nightly whl
apt update && apt install python3-dev python3-pip git -y
pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/rocm4.3.1/torch_nightly.html
# Pull the vision benchmark scripts and train Resnet50 FP16
cd ~ && git clone https://github.com/ROCmSoftwarePlatform/pytorch-micro-benchmarking.git
cd ~/pytorch-micro-benchmarking/ && python3.6 micro_benchmarking_pytorch.py --network resnet50 --batch-size 128 --fp16 1
@s-marios Please do not publish off-topic comments anymore; this wastes our time.
The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well?
Sorry, I can try it only on the weekend.
The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well?
Sorry, I can try it only on the weekend.
Thanks, nvm I have tried that it didn't make a difference. We'll try to provide some experimental WHls soon, and then propagate the fix to the upstream.
Can I find and use the experimental .whl
packages somewhere, already? I would like to help with testing them on my affected code base.
Running Lubuntu 18.04.05 and Vega 64 here.
I installed ROCm 4.3.1 from official AMD repo via apt
and the nightly PyTorch package from link in @sunway513's post.
I confirm resnet50 FP16 model training executes correctly.
However, when I tried to train a vocal-remover model, it would complain about missing headers.
Here's a log with MIOPEN_LOG_LEVEL=5
tide@tide-X299-DESIGNARE-EX:/media/tide/Dane/datasety/UVR/vocal-remover-develop$ ./train.sh
### DEBUG MODE
1 mix.flac instr.flac
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 4.52it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.17s/it]
# epoch 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx900:xnack-
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx900:xnack-
MIOpen(HIP): Info [SetStream] stream: 0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3)
MIOpen(HIP): Info [ForwardGetWorkSpaceSize]
MIOpen(HIP): Info [HipCompilerVersionImpl] 4.3.21331
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.4d0489c72
MIOpen(HIP): Info [GetForwardSolutions]
MIOpen(HIP): Info [Measure] Db::Prefetch time: 68.898 ms
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 9437184
MIOpen(HIP): Info [GetForwardSolutions]
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info [SQLiteBase] SQLite does not support WAL
MIOpen(HIP): Info [FindSolutionImpl] ConvBinWinogradRxSf2x3g1 (not searchable)
MIOpen(HIP): Info [GetPerformanceConfig] 64
MIOpen(HIP): Info [FindSolutionImpl] GemmFwdRest (not searchable)
MIOpen(HIP): Info [FindSolutionImpl] ConvOclDirectFwd
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclDirectFwd
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvFwd (not searchable)
MIOpen(HIP): Info [CreateInMemDb] Unknown database: /opt/rocm/miopen/share/miopen/db/gfx900_64.kdb in internal file cache
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm/miopen/share/miopen/db/gfx900_64.kdb Performance may degrade
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info [SQLiteBase] SQLite does not support WAL
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.1.0, USE_HIP_PCH: 1
MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1)
MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1)
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-5e8aa5/input/naive_conv.cpp:26:10: fatal error: 'hip/hip_fp16.h' file not found
#include <hip/hip_fp16.h>
^~~~~~~~~~~~~~~~
1 error generated when compiling for gfx900.
terminate called after throwing an instance of 'miopen::Exception'
what(): /MIOpen/src/hipoc/hipoc_program.cpp:295: Code object build failed. Source: naive_conv.cpp
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.1.0, USE_HIP_PCH: 1 MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1) MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1) MIOpen(HIP): Warning [BuildHip] /tmp/comgr-5e8aa5/input/naive_conv.cpp:26:10: fatal error: 'hip/hip_fp16.h' file not found
include <hip/hip_fp16.h>
^~~~~~~~~~~~~~~~
1 error generated when compiling for gfx900.
The USE_HIP_PCH
(pre-compiled header) feature is turned on, so if the naive_conv.cpp file is including it, it would be a bug. The hip_fp16.h header cannot be assumed to be installed and available. This is true for pytorch wheels that should be able to use hiprtc without having the HIP compilers and headers available. Looks like MIOpen needs a similar capability.
@ThisKwasior Thanks for reporting. The problem described at https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1204#issuecomment-946978785 is a different issue. This patch resolves it:
src/kernels/gpu_reference_kernel/naive_conv.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/kernels/gpu_reference_kernel/naive_conv.cpp b/src/kernels/gpu_reference_kernel/naive_conv.cpp
index af829e4ca..d1a302387 100644
--- a/src/kernels/gpu_reference_kernel/naive_conv.cpp
+++ b/src/kernels/gpu_reference_kernel/naive_conv.cpp
@@ -23,8 +23,10 @@
* SOFTWARE.
*
*******************************************************************************/
+#ifndef MIOPEN_DONT_USE_HIP_RUNTIME_HEADERS
#include <hip/hip_fp16.h>
#include <hip/hip_runtime.h>
+#endif
// hcc seems need __device__ __host__ together to compile, and no extern "C"
typedef union _cvt_bf16_fp32
The following env settings disable usage of naive_conv.cpp:
export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_FWD=0
export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_BWD=0
export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_WRW=0
It looks like now it may be closed
https://github.com/RadeonOpenCompute/ROCm/issues/1572#issuecomment-992589243
python3.6.6 pytorch 1.9.1 with ROCm support Ubuntu:20.04
based on RadeonOpenCompute/ROCm#1572
PS: but! those lines from 1572 work for me