Vega 64: hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl

Sigura commented 2 years ago

python3.6.6 pytorch 1.9.1 with ROCm support Ubuntu:20.04

MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm/miopen/share/miopen/db/gfx900_64.kdb Performance may degrade
MIOpen(HIP): Error [SetIsaName] 'amd_comgr_action_info_set_isa_name(handle, isa.c_str())' amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-: INVALID_ARGUMENT (2)
MIOpen(HIP): Error [BuildOcl] comgr status = INVALID_ARGUMENT (2)
MIOpen(HIP): Warning [BuildOcl] amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-
MIOpen Error: /MIOpen/src/hipoc/hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl

Traceback (most recent call last):
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 493, in run_step
    self._trainer.run_step()
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 273, in run_step
    loss_dict = self.model(data)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
    features = self.backbone(images.tensor)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
    bottom_up_features = self.bottom_up(x)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward
    x = self.stem(x)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward
    x = self.conv1(x)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/layers/wrappers.py", line 107, in forward
    x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups
RuntimeError: miopenStatusUnknownError

based on RadeonOpenCompute/ROCm#1572

PS: but! those lines from 1572 work for me

import torch
import torchvision

device = torch.device('cuda')
print(f"using device: {device}")
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True).to(device)
model.eval()

rand_input = torch.rand(3, 1920, 1080, device=device)
model([rand_input])

atamazov commented 2 years ago

@Sigura Please show output of /opt/rocm/opencl/bin/clinfo | grep gfx.

atamazov commented 2 years ago

@Sigura Another question if why you use COMGR for building kernels. Currently the default build path is offline compilation. Please consult PyTorch guys who work on AMD support.

atamazov commented 2 years ago

/cc @sunway513

Sigura commented 2 years ago

@Sigura Please show output of /opt/rocm/opencl/bin/clinfo | grep gfx.

Name: gfx900:xnack-

Sigura commented 2 years ago

@Sigura Another question if why you use COMGR for building kernels. Currently the default build path is offline compilation. Please consult PyTorch guys who work on AMD support.

Could you please help to check my understanding – you are a mean question about how to change the path to MIOpen for PyTorch?

Sigura commented 2 years ago

@atamazov I'm open to any questions. I will be glad to help with your great work!

Sigura commented 2 years ago

/opt/rocm/opencl/bin/clinfo


Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (3305.0)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback 

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:               PCI[ B#3, D#0, F#0 ]
  Max compute units:                 64
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1630Mhz
  Address bits:                  64
  Max memory allocation:             7287183768
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                16384
  Max image 3D height:               16384
  Max image 3D depth:                8192
  Max samplers within kernel:            26751
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                8573157376
  Constant buffer size:              7287183768
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 65536
  Max pipe arguments:                16
  Max pipe active reservations:          16
  Max pipe packet size:              2992216472
  Max global variable size:          7287183768
  Max global variable preferred total size:  8573157376
  Max read/write image args:             64
  Max on device events:              1024
  Queue on device max size:          8388608
  Max on device queues:              1
  Queue on device preferred size:        262144
  SVM capabilities:              
    Coarse grain buffer:             Yes
    Fine grain buffer:               Yes
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                Yes
    Profiling :                  Yes
  Platform ID:                   0x7f59a80a4e10
  Name:                      gfx900:xnack-
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 2.0 
  Driver version:                3305.0 (HSA1.1,LC)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 2.0 
  Extensions:                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

traceback

RuntimeError                              Traceback (most recent call last)
 in 
      1 trainer = DefaultTrainer(cfg)
      2 trainer.resume_or_load(resume=False) # (resume=False)
----> 3 trainer.train()

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/defaults.py in train(self)
    481             OrderedDict of results, if evaluation is enabled. Otherwise None.
    482         """
--> 483         super().train(self.start_iter, self.max_iter)
    484         if len(self.cfg.TEST.EXPECTED_RESULTS) and comm.is_main_process():
    485             assert hasattr(

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py in train(self, start_iter, max_iter)
    147                 for self.iter in range(start_iter, max_iter):
    148                     self.before_step()
--> 149                     self.run_step()
    150                     self.after_step()
    151                 # self.iter == max_iter can be used by `after_train` to

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/defaults.py in run_step(self)
    491     def run_step(self):
    492         self._trainer.iter = self.iter
--> 493         self._trainer.run_step()
    494 
    495     @classmethod

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/engine/train_loop.py in run_step(self)
    271         If you want to do something with the losses, you can wrap the model.
    272         """
--> 273         loss_dict = self.model(data)
    274         if isinstance(loss_dict, torch.Tensor):
    275             losses = loss_dict

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in forward(self, batched_inputs)
    152             gt_instances = None
    153 
--> 154         features = self.backbone(images.tensor)
    155 
    156         if self.proposal_generator is not None:

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/fpn.py in forward(self, x)
    124                 ["p2", "p3", ..., "p6"].
    125         """
--> 126         bottom_up_features = self.bottom_up(x)
    127         results = []
    128         prev_features = self.lateral_convs[0](bottom_up_features[self.in_features[-1]])

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py in forward(self, x)
    443         assert x.dim() == 4, f"ResNet takes an input of shape (N, C, H, W). Got {x.shape} instead!"
    444         outputs = {}
--> 445         x = self.stem(x)
    446         if "stem" in self._out_features:
    447             outputs["stem"] = x

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/modeling/backbone/resnet.py in forward(self, x)
    354 
    355     def forward(self, x):
--> 356         x = self.conv1(x)
    357         x = F.relu_(x)
    358         x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/home/sigura/.pyenv/versions/3.6.6/lib/python3.6/site-packages/detectron2/layers/wrappers.py in forward(self, x)
    105 
    106         x = F.conv2d(
--> 107             x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups
    108         )
    109         if self.norm is not None:

RuntimeError: miopenStatusUnknownError

atamazov commented 2 years ago

@Sigura Please expect feedback tomorrow, sorry for delaying.

atamazov commented 2 years ago

@Sigura Please also share output of

/opt/rocm/bin/rocminfo | grep gfx
cat /sys/module/amdgpu/version

atamazov commented 2 years ago

I need to know the ROCm version that you use. I remember that some versions have problems with the kernel mode driver, and that problems are gfx900-specific.

atamazov commented 2 years ago

What happens (most likely): HIP runtime reports to MIOpen that target has SRAMECC feature, and its current setting is OFF. Therefore, MIOpen adds "sramecc-" into the target (device) name for compilation. However, gfx900 does not have SRAMECC feature ane neither "sramecc-" or "sramecc+" must be included. The compilation tool that we use (COMGR library) detects that target name is incorrect and issues an error.

Most likely, the problem can be solved by adding some hack in the library or by using suitable version of ROCm.

Please export MIOPEN_LOG_LEVEL=5, collect console output of your application, ZIP it and attach here. This would provide me with additional information.

Also I am wondering why COMGR is used instead of offline compiler, which is the default.

Could you please help to check my understanding – you are a mean question about how to change the path to MIOpen for PyTorch?

No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.

I think you need to discuss with with guys from https://github.com/ROCmSoftwarePlatform/pytorch

/cc @sunway513

Sigura commented 2 years ago

I need to know the ROCm version that you use. I remember that some versions have problems with the kernel mode driver, and that problems are gfx900-specific.

ROCm ~~4.3.1~~ 4.2 sorry, I have got it with pytorch

Sigura commented 2 years ago

The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels.

I will try rebuild it in few days

atamazov commented 2 years ago

If you have kernel mode driver from 4.3 or 4.3.1, then most likely you have problems with gfx900.

ffleader1 commented 2 years ago

Hey I just want to report that I came across this exact problem, with VEGA 56. So it does seem to affect quite a lot of people

MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:gfx900_56.kdb Performance may degrade
MIOpen(HIP): Error [SetIsaName] 'amd_comgr_action_info_set_isa_name(handle, isa.c_str())' amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-: INVALID_ARGUMENT (2)
MIOpen(HIP): Error [BuildOcl] comgr status = INVALID_ARGUMENT (2)
MIOpen(HIP): Warning [BuildOcl] amdgcn-amd-amdhsa--gfx900:sramecc-:xnack-
MIOpen Error: /MIOpen/src/hipoc/hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl
  0%|          | 0/1000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/train.py", line 51, in <module>
    main_worker(0, 1, args)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/train.py", line 31, in main_worker
    trainer.train()
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/trainer/trainer.py", line 108, in train
    pred_img = self.netG(images_masked, masks)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/src/model/aotgan.py", line 37, in forward
    x = self.encoder(x)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/ffleader1/PycharmProjects/AOT-GAN-for-Inpainting/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: miopenStatusUnknownError

Process finished with exit code 1

what can be a quick/painless option to solve this? Thank you.

atamazov commented 2 years ago

I will create a workaround patch for this, this week. Stay tuned.

atamazov commented 2 years ago

@ffleader1 Which version of ROCm are you using?

ffleader1 commented 2 years ago

V4.2. I tried 4.3 but apparently it just straightout does not work with Pytorch

ffleader1 commented 2 years ago

@ffleader1 Which version of ROCm are you using?

Well just a head up It also does not work with Ubuntu 18 I was expecting that by installing Ubuntu 18, the problem would be resolved, but apparently not :<

atamazov commented 2 years ago

@ffleader1

V4.2. I tried 4.3 but...

If you've installed 4.3 once, then most likely you still have the kernel module from 4.3, which known to have issues with gfx900. I will prepare a patch soon.

atamazov commented 2 years ago

Please try the wa-issue-1204 branch or apply the attached patch, re-build, re-install and let me know if this resolves your problem.

wa-issue-1204-28a2362e2.diff.txt

atamazov commented 2 years ago

No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.

@ffleader1 @Sigura The question to both of you still remains: why the MIOpen used by PyTorch was built with the -DMIOPEN_USE_COMGR=On CMake option. Do you use PyTorch wheels?

ffleader1 commented 2 years ago

No. MIOpen can use different ways to build kernels. The default way is offline compiler, i.e. /opt/rocm/bin/clang++ which is capable to build HIP, OpenCL and assembly kernels. But in your case MIOpen was built to use COMGR library (which can build HIP, OpenCL and assembly kernels as well). This feature is still experimental and not guaranteed to work in all cases.

@ffleader1 @Sigura The question to both of you still remains: why the MIOpen used by PyTorch was built with the -DMIOPEN_USE_COMGR=On CMake option. Do you use PyTorch wheels?

It was the version that was published on pytorch homepage. Anyway, I really want to try out your patch, but I am normally a Windows kinda guy, so all this building HIP/MIOpen/Rocm from source seem overwhelming to me. I presume the problem was due to pytorch rocm being build with bad confgig? I guess I will wait for a better release of pytorch rocm then.

Sigura commented 2 years ago

PyTorch was built with the -DMIOPEN_USE_COMGR=On

I think it does not. At this moment, I'm focused on other tasks. I will try to return on the weekend to this issue.

And I used 4.2. For me, it looks like it does not support Vega 10 chip.

I will try to build 4.3 with your patch.

Bengt commented 2 years ago

So,

the current version of the ROCm compute stack is broken for a supported GPU and a supported ML framework.
the only workaround is to compile MIOpen from source, which only masks the underlying issue of ROCM's device naming.

How is this urgency low? This feels like kind of a major problem to me, that should be fixed with high priority. - Albeit not in MIOpen, but in ROCm.

junliume commented 2 years ago

Urgency level reflects a workaround or clear path to solution is available (https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1204#issuecomment-940485505). Internally we are discussion how to fix this issue asap. Please stay tuned.

sunway513 commented 2 years ago

The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well? pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/rocm4.3.1/torch_nightly.html Note the PyTorch whl package is fully "self-contained" for all the ROCm components required to execute the PyTorch library. @atamazov is it possible to cherry-pick your changes to the 4.3.1 release branch, so the next PyTorch nightly whl can pick up the fix from here: https://github.com/pytorch/builder/blob/main/common/install_miopen.sh#L100

atamazov commented 2 years ago

@Bengt

How is this urgency low? This feels like kind of a major problem to me, that should be fixed with high priority.

We do value opinions of our users (and expect the same from them). The level changed to high upon your request.

atamazov commented 2 years ago

@sunway513

is it possible to cherry-pick your changes to the 4.3.1 release branch

If you do not insist, I would better avoid cherry-picking into release/rocm-rel-4.3, because this violates the release process (for example, the released 4.3.1 MIOpen will not match the tip of the release/rocm-rel-4.3 after that, which is not good). But I can create some other branch on top of that (e.g. release/rocm-rel.4.3.1-for-pytorch). Would this work for you?

But at first we need to find out if this patch resolves the issue. Unfortunately we do not have any feedback from the users who run into it.

sunway513 commented 2 years ago

@atamazov , We've got the reproducible environment locally, will try to validate your patch and move from there. Cherry-picking hot fixes to release branches seems aligned to what we have been doing for both Tensorflow and PyTorch in upstream.

s-marios commented 2 years ago

@sunway513 @atamazov looking forward to a release soon.

Unsolicited user feedback below (rant), feel free to ignore

Currently there are essentially no consumer cards that can run ROCm; good luck finding a Radeon VII, vega 56/64 is currently broken, RX5xxx/RX6xxx unsupported. I really don't want to give money to the Leather Jacket Man, please.

Also, I'm sure you are aware that it's not a good look when a supposedly supported card does not work for such long stretches of time. I'm not sure what's going on at AMD's side, but I bet that testing could improve dramatically.

sunway513 commented 2 years ago

I was able to reproduce the reported error on torchvision maskrcnn resent50 inference workload. And now we're in progress re-generating the WHL packages with the MIOpen testing branch to validate further.

In the meanwhile, I can confirm the torchvision resnet50 FP16 model training can execute correctly using the pytorch 4.3.1 nightly whl on Vega64 GPU.

Here's my command for reference:

# Launch the ROCm 4.3.1 base docker container
alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx’ 
drun rocm/dev-ubuntu-18.04:4.3.1

# Pull and install the PyTorch and torchvision 4.3.1 nightly whl
apt update && apt install python3-dev python3-pip git -y
pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/rocm4.3.1/torch_nightly.html

# Pull the vision benchmark scripts and train Resnet50 FP16
cd ~ && git clone https://github.com/ROCmSoftwarePlatform/pytorch-micro-benchmarking.git
cd ~/pytorch-micro-benchmarking/ && python3.6 micro_benchmarking_pytorch.py --network resnet50 --batch-size 128 --fp16 1

atamazov commented 2 years ago

@s-marios Please do not publish off-topic comments anymore; this wastes our time.

Sigura commented 2 years ago

The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well?

Sorry, I can try it only on the weekend.

sunway513 commented 2 years ago

The PyTorch 4.3.1 nightly whl package is now available, @Sigura can you try this out as well?

Sorry, I can try it only on the weekend.

Thanks, nvm I have tried that it didn't make a difference. We'll try to provide some experimental WHls soon, and then propagate the fix to the upstream.

Bengt commented 2 years ago

Can I find and use the experimental .whl packages somewhere, already? I would like to help with testing them on my affected code base.

ThisKwasior commented 2 years ago

Running Lubuntu 18.04.05 and Vega 64 here. I installed ROCm 4.3.1 from official AMD repo via apt and the nightly PyTorch package from link in @sunway513's post. I confirm resnet50 FP16 model training executes correctly. However, when I tried to train a vocal-remover model, it would complain about missing headers.

Here's a log with MIOPEN_LOG_LEVEL=5

tide@tide-X299-DESIGNARE-EX:/media/tide/Dane/datasety/UVR/vocal-remover-develop$ ./train.sh 
### DEBUG MODE
1 mix.flac instr.flac
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.52it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.17s/it]
# epoch 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx900:xnack-
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx900:xnack-
MIOpen(HIP): Info [SetStream] stream: 0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3)
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info [HipCompilerVersionImpl] 4.3.21331
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.4d0489c72
MIOpen(HIP): Info [GetForwardSolutions] 
MIOpen(HIP): Info [Measure] Db::Prefetch time: 68.898 ms
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 9437184
MIOpen(HIP): Info [GetForwardSolutions] 
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info [SQLiteBase] SQLite does not support WAL
MIOpen(HIP): Info [FindSolutionImpl] ConvBinWinogradRxSf2x3g1 (not searchable)
MIOpen(HIP): Info [GetPerformanceConfig] 64
MIOpen(HIP): Info [FindSolutionImpl] GemmFwdRest (not searchable)
MIOpen(HIP): Info [FindSolutionImpl] ConvOclDirectFwd
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclDirectFwd
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvFwd (not searchable)
MIOpen(HIP): Info [CreateInMemDb] Unknown database: /opt/rocm/miopen/share/miopen/db/gfx900_64.kdb in internal file cache
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm/miopen/share/miopen/db/gfx900_64.kdb Performance may degrade
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info [SQLiteBase] SQLite does not support WAL
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.1.0, USE_HIP_PCH: 1
MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1)
MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1)
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-5e8aa5/input/naive_conv.cpp:26:10: fatal error: 'hip/hip_fp16.h' file not found
#include <hip/hip_fp16.h>
         ^~~~~~~~~~~~~~~~
1 error generated when compiling for gfx900.

terminate called after throwing an instance of 'miopen::Exception'
  what():  /MIOpen/src/hipoc/hipoc_program.cpp:295: Code object build failed. Source: naive_conv.cpp

jeffdaily commented 2 years ago

MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.1.0, USE_HIP_PCH: 1 MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1) MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1) MIOpen(HIP): Warning [BuildHip] /tmp/comgr-5e8aa5/input/naive_conv.cpp:26:10: fatal error: 'hip/hip_fp16.h' file not found

include <hip/hip_fp16.h>
     ^~~~~~~~~~~~~~~~
1 error generated when compiling for gfx900.

The USE_HIP_PCH (pre-compiled header) feature is turned on, so if the naive_conv.cpp file is including it, it would be a bug. The hip_fp16.h header cannot be assumed to be installed and available. This is true for pytorch wheels that should be able to use hiprtc without having the HIP compilers and headers available. Looks like MIOpen needs a similar capability.

atamazov commented 2 years ago

@ThisKwasior Thanks for reporting. The problem described at https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1204#issuecomment-946978785 is a different issue. This patch resolves it:

 src/kernels/gpu_reference_kernel/naive_conv.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/kernels/gpu_reference_kernel/naive_conv.cpp b/src/kernels/gpu_reference_kernel/naive_conv.cpp
index af829e4ca..d1a302387 100644
--- a/src/kernels/gpu_reference_kernel/naive_conv.cpp
+++ b/src/kernels/gpu_reference_kernel/naive_conv.cpp
@@ -23,8 +23,10 @@
  * SOFTWARE.
  *
  *******************************************************************************/
+#ifndef MIOPEN_DONT_USE_HIP_RUNTIME_HEADERS
 #include <hip/hip_fp16.h>
 #include <hip/hip_runtime.h>
+#endif

 // hcc seems need __device__ __host__ together to compile, and no extern "C"
 typedef union _cvt_bf16_fp32

Workaround

The following env settings disable usage of naive_conv.cpp:

export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_FWD=0
export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_BWD=0
export MIOPEN_DEBUG_CONV_DIRECT_NAIVE_CONV_WRW=0

Sigura commented 2 years ago

It looks like now it may be closed

https://github.com/RadeonOpenCompute/ROCm/issues/1572#issuecomment-992589243

ROCm / MIOpen

Vega 64: hipoc_program.cpp:286: Code object build failed. Source: MIOpenIm2d2Col.cl #1204

Unsolicited user feedback below (rant), feel free to ignore

include <hip/hip_fp16.h>

Workaround