Open Skillnoob opened 4 weeks ago
After modifying the code to this:
from ultralytics import YOLO
import pytorch_ocl
def fuse(self, *args, **kwargs):
return self
model = YOLO('yolov8n.pt')
model.model.fuse = fuse.__get__(model.model, type(model.model))
model.val(data='coco8.yaml', batch=1)
I get the following error log:
Ultralytics YOLOv8.2.74 🚀 Python-3.11.9 torch-2.4.0+cpu CPU (AMD Ryzen 7 7800X3D 8-Core Processor)
Accessing device #0:gfx1100 on AMD Accelerated Parallel Processing
Dataset 'coco8.yaml' images not found ⚠️, missing path 'C:\Users\makei\Desktop\fdyxv\datasets\coco8\images\val'
Downloading https://ultralytics.com/assets/coco8.zip to 'C:\Users\makei\Desktop\fdyxv\datasets\coco8.zip'...
100%|██████████| 433k/433k [00:00<00:00, 4.57MB/s]
Unzipping C:\Users\makei\Desktop\fdyxv\datasets\coco8.zip to C:\Users\makei\Desktop\fdyxv\datasets\coco8...: 100%|██████████| 25/25 [00:00<00:00, 3124.48file/s]
val: Scanning C:\Users\makei\Desktop\fdyxv\datasets\coco8\labels\val... 4 images, 0 backgrounds, 0 corrupt: 100%|██████████| 4/4 [00:00<00:00, 444.34it/s]
Dataset download success ✅ (1.3s), saved to C:\Users\makei\Desktop\fdyxv\datasets
val: New cache created: C:\Users\makei\Desktop\fdyxv\datasets\coco8\labels\val.cache
C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\functional.py:796: UserWarning: The operator 'aten::max_pool2d_with_indices.out' is not currently supported on the ocl backend. Please open an issue at for requesting support https://github.com/artyom-beilis/pytorch_dlprim/issues (Triggered internally at C:\Users\artik\Projects\build_env\pytorch_dlprim\src\tensor_ops.cpp:336.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\functional.py:4050: UserWarning: The operator 'aten::upsample_nearest2d.out' is not currently supported on the ocl backend. Please open an issue at for requesting support https://github.com/artyom-beilis/pytorch_dlprim/issues (Triggered internally at C:\Users\artik\Projects\build_env\pytorch_dlprim\src\tensor_ops.cpp:336.)
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\tal.py:303: UserWarning: The operator 'aten::arange.start_out' is not currently supported on the ocl backend. Please open an issue at for requesting support https://github.com/artyom-beilis/pytorch_dlprim/issues (Triggered internally at C:\Users\artik\Projects\build_env\pytorch_dlprim\src\tensor_ops.cpp:336.)
sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset # shift x
Traceback (most recent call last):
File "C:\Users\makei\Desktop\opencl testing\main.py", line 17, in <module>
main()
File "C:\Users\makei\Desktop\opencl testing\main.py", line 13, in main
model.val(data='coco8.yaml', batch=1)
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\model.py", line 644, in val
validator(model=self.model)
File "C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\validator.py", line 157, in __call__
model.warmup(imgsz=(1 if pt else self.args.batch, 3, imgsz, imgsz)) # warmup
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\autobackend.py", line 639, in warmup
self.forward(im) # warmup
^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\autobackend.py", line 456, in forward
y = self.model(im, augment=augment, visualize=visualize, embed=embed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\tasks.py", line 102, in forward
return self.predict(x, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\tasks.py", line 120, in predict
return self._predict_once(x, profile, visualize, embed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\tasks.py", line 141, in _predict_once
x = m(x) # run
^^^^
File "C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\miniconda3\envs\ultralytics\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\modules\head.py", line 60, in forward
y = self._inference(x)
^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\nn\modules\head.py", line 93, in _inference
self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\tal.py", line 303, in make_anchors
sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset # shift x
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Buffer is not valid for unallocated defvice```
I tried to run Ultralytics using the most recent release.
Ok Yolo was one of the next thing in my todo list - validate it works.
The operator 'aten::mm.out' is not currently supported on the ocl backend.
Ok this is going to be an easy to fix. I surprised that there is yet another gemm operator
aten::max_pool2d_with_indices.out
On this is little bit more complicated, my internal implementation does not use indices but I assume it shouldn't be complex.
File "C:\Users\makei\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\tal.py", line 303, in make_anchors
sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset # shift x
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Buffer is not valid for unallocated defvice
Ok... This is something I need to check.
Just to update... Validating YOLO is important but hard.
Now I discovered that you can torch.cat different types and result type is not really documented... Din't expected to concatenate long and float tensors.
Many other things that generated failure have been fixed but there is still a lot to do...
Ok... it is interesting chalange
fuse
workaround any moreIn order to make it run I needed to fix small things in their code (they ignore half = False and some other stuff) but in general I managed to complete the run (see fixed in diff below)
BUT - lots of operators are fallbacking to CPU... (See list below) some are easy and can be implemented with simple boradcast/reduce/pointwise operators but some are little bit trickier and some I don't even have an idea what are they doing.
Here is the list
Who wants to give a hand implementing them?
'aten::addmv.out'
'aten::all.all_out'
'aten::amax.out'
'aten::amin.out'
'aten::atan.out'
'aten::bitwise_not.out'
'aten::bitwise_or.Tensor_out'
'aten::gt.Tensor_out'
'aten::_index_put_impl_'
'aten::index.Tensor_out'
'aten::le.Tensor_out'
'aten::linalg_vector_norm.out'
'aten::log_sigmoid_forward'
'aten::lt.Tensor_out'
'aten::masked_fill_.Scalar'
'aten::max.dim_max'
'aten::maximum.out'
'aten::max_pool2d_with_indices.out'
'aten::minimum.out'
'aten::mm.out'
'aten::nonzero'
'aten::pow.Tensor_Scalar_out'
'aten::prod.int_out'
'aten::scatter_add.out'
'aten::scatter.value_out'
'aten::sort.values_stable'
'aten::topk.values'
'aten::unfold'
'aten::_unique2'
'aten::upsample_nearest2d_backward.grad_input'
'aten::upsample_nearest2d.out'
'aten::where.self'
'torchvision::nms'
These are changes in ultranalitics code
--- venv/pt_rocm/lib/python3.10/site-packages/ultralytics/utils/torch_utils.py 2024-08-18 23:57:25.125804942 +0300
+++ venv/pt_cpu_2.4/lib/python3.10/site-packages/ultralytics/utils/torch_utils.py 2024-08-19 22:16:54.377978652 +0300
@@ -156,7 +156,8 @@
device = device.replace(remove, "") # to string, 'cuda:0' -> '0' and '(0, 1)' -> '0,1'
cpu = device == "cpu"
mps = device in {"mps", "mps:0"} # Apple Metal Performance Shaders (MPS)
- if cpu or mps:
+ ocl = device.find('ocl')==0
+ if cpu or mps or ocl:
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # force torch.cuda.is_available() = False
elif device: # non-cpu device requested
if device == "cuda":
--- venv/pt_rocm/lib/python3.10/site-packages/ultralytics/engine/validator.py 2024-08-18 23:57:25.118804644 +0300
+++ venv/pt_cpu_2.4/lib/python3.10/site-packages/ultralytics/engine/validator.py 2024-08-19 22:21:07.850314250 +0300
@@ -112,7 +112,7 @@
if self.training:
self.device = trainer.device
self.data = trainer.data
- self.args.half = self.device.type != "cpu" # force FP16 val during training
+ self.args.half = self.args.half and self.device.type != "cpu" # force FP16 val during training
model = trainer.ema.ema or trainer.model
model = model.half() if self.args.half else model.float()
# self.model = model
Not so fast my friend, more to go:
lerp.Scalar_out native_dropout gather.out index_select upsample_bilinear2d.out
Not so fast my friend, more to go...
Indeed - lots of operators... btw mm.out on its way
Updates: following are done: mm
, bmm
, amax
, amin
, native_dropout
, arange
, resize_
. fixes in some other operators - allow softmax/logsoftmax work on multiple dimensions (performance issue with gelu...
More to go...
I've created a fork of Ultralytics here, which adds more proper pytorch_ocl support to Ultralytics. The draft PR can be found here, since support on the pytorch_ocl side is not yet fully finished and the current release is incompatible. Code example how validation would be run now:
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
model.val(data='coco8.yaml', batch=1, device="ocl")
The device can be either ocl
, which defaults to ocl:0
or the regular ocl:<device number>
.
I tried to run Ultralytics using the most recent release.
Minimal example to reproduce:
This line needs to be modified to return
torch.device('ocl:0')
, otherwise Ultralytics will complain about passing a wrong device or only run on the CPU.My GPU: Radeon RX 7900 GRE
Full log: