artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
276 stars 17 forks source link

I fixed all of bugs with contiguous() #67

Closed sukamenev closed 2 months ago

artyom-beilis commented 6 months ago

Wow amazing. I had no guts to do so much fixes!

I'll go over it ASAP and merge!

Thanks!

sukamenev commented 6 months ago

Hello! I'm waiting for code review

artyom-beilis commented 6 months ago

I think this is the only fix. Once it is done I can merge it.

Also please run the tests.

sukamenev commented 6 months ago

Two of of tests don't working with simular errors not related to my fixes.

python tests/test_op.py --device privateuseone:1 python tests/validate_network.py --device privateuseone:1

Mean 1d Traceback (most recent call last): File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 282, in test_all(r.device) File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 158, in test_all test_fwd_bwd([([2,3,4],-1)],lambda x:torch.mean(x,dim=0,keepdim=True),device) File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 114, in test_fwd_bwd x_dev = x_cpu.to(device) NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'PrivateUse1' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

sukamenev commented 6 months ago

My own application training network with my fixes without probles. I'm using PyTorch 1.13.1

artyom-beilis commented 6 months ago
python tests/test_op.py --device privateuseone:1
python tests/validate_network.py --device privateuseone:1

Ohh my bad fixed it you can pass --device ocl:1 instead

sukamenev commented 6 months ago

When tested, my code produces the same errors as yours (see Issues). I think the point is that my version of OpenCL (Rusticl on Linux: radeonsi, fiji, LLVM 17.0.6, DRM 3.54) may have some errors.

Although it passes all yours performance tests without errors.

bash benchark_all.sh

Output:

alexnet 52.907
resnet18 136.820
vgg16 Radeon
densenet161 653.560
inception_v3 260.392
mobilenet_v2 129.067
mobilenet_v3_small 49.416
mobilenet_v3_large 106.047
resnext50_32x4d 417.894
wide_resnet50_2 809.460
mnasnet1_0 114.950
efficientnet_b0 174.637
efficientnet_b4 407.656
regnet_y_400mf 94.731
sukamenev commented 6 months ago

Please try running my code test on your computer. I think everything will be great.

artyom-beilis commented 2 months ago

I merged your changes and fixed the issue in concat + added tests for concat.

Thanks for contribution