Closed tangjinchuan closed 1 month ago
The test.py is fine. Therefore, the hook problem above is quite similar to https://github.com/artyom-beilis/pytorch_dlprim/issues/58
/Users/tjc/PycharmProjects/pythonProject8/.venv/bin/python /Users/tjc/Documents/pytorch_dlprim/test.py Accessing device #0:Apple M1 on Apple REF [[[ 0 137] [ 0 0] [255 255] [ 0 175]]
[[ 0 0] [ 0 255] [ 0 247] [ 0 128]]
[[ 19 0] [ 0 255] [ 88 0] [ 0 0]]] DEV [[[ 0 137] [ 0 0] [255 255] [ 0 175]]
[[ 0 0] [ 0 255] [ 0 247] [ 0 128]]
[[ 19 0] [ 0 255] [ 88 0] [ 0 0]]] 0.0
Process finished with exit code 0
Can you please try with pytorch 1.13... I know that there are some issues with newer ones. I think I checked 2.0
I need to do some serious testing on multiple versions of pytorch it is just too hard to keep up with them :-)
Yes, 1.13 is a pass. The speed of an MLP is only half of Apple's 'mps'. I guess I need to report the device info to you like last time I did with Intel Arc A770 16G so that you could tune it? Seems like the Apple silicon Max 2 on gemm.cpp did not work properly or the same ? https://github.com/artyom-beilis/pytorch_dlprim/issues/10#issuecomment-1892229711
/Users/tjc/PycharmProjects/pythonProject9/.venv/bin/python /Users/tjc/PycharmProjects/pythonProject9/aaa.py The True MI is 0.658629
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/Users/tjc/PycharmProjects/pythonProject9/aaa.py", line 94, in
model = Net().to(device) File "/Users/tjc/PycharmProjects/pythonProject9/aaa.py", line 83, in init self.fc1 = nn.Linear(1, H) # fc:fully connected File "/Users/tjc/PycharmProjects/pythonProject9/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), factory_kwargs)) /Users/tjc/PycharmProjects/pythonProject9/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py:96: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:77.) self.weight = Parameter(torch.empty((out_features, in_features), factory_kwargs)) Accessing device #0:Apple M1 on Apple /Users/tjc/PycharmProjects/pythonProject9/aaa.py:102: UserWarning: The operator 'aten::index.Tensor_out' is not currently supported on the ocl backend. Please open an issue at for requesting support https://github.com/artyom-beilis/pytorch_dlprim/issues (Triggered internally at /Users/tjc/Documents/pytorch_dlprim310/src/tensor_ops.cpp:313.) Y_SHUFFLE = Y[torch.randperm(Y.size(0))] 100%|██████████| 5000/5000 [00:39<00:00, 127.34it/s] execution_time= 39.29358887672424 MINE= [-0.00492738 0.01242018 0.03254801 ... 0.68931293 0.71951771 0.72044754] True MI= [0.65862906 0.65862906 0.65862906 ... 0.65862906 0.65862906 0.65862906] execution_time= 39.29358887672424
For anyone who needs this working lib. libpt_ocl python310 pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 .zip
Started testing once again new pytorch versions and there are some basic build gaps...
I'm checking with dev-discuss to see the issues, for example:
find_package(Torch REQUIRED)
Fails under 2.3.1
I saw your post: https://discuss.pytorch.org/t/find-package-torch-required-fails-2-3-1-and-nightly/205248 I guess it is due to a self-compiled version of Pytorch. Installing stable pytorch via pip3 with Ubuntu24.04, Windows 11 and MAC OS latest M1 for me and my student has no problem.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
More about the problem as well as the PR might be here: https://github.com/pytorch/pytorch/issues/118862
Actually it is exactly how I install the version. I install the official version. Since 1.13 the support of out of tree backend improved the way I don't need to touch any pytorch code.
Let me double-check, confused, why it is called pt_3.12_nightly in your file location?
For nightly, do you need to change it from : -DCMAKE_PREFIX_PATH=$VIRTUAL_ENV/lib/python3.12/site-packages/torch/share/cmake/Torch to -DCMAKE_PREFIX_PATH=$VIRTUAL_ENV/pt_3.12_nightly/lib/python3.12/site-packages/torch/share/cmake/Torch
OK, I have seen you updated cmake and solved it.
Ok it looks like most critical issues. I went over docs and lots had changed.
Need to do lots of fixes to make it all work (backend registration etc)
Currently I'm stuck on pytorch 2.3 & nightly with this issues: https://dev-discuss.pytorch.org/t/pytorch-out-of-tree-backend-updates-changes-question/2189/1
I hope I'll understand how to fix it with the developer's support
Hi , sorry for not seeing the message recently. Busy stuff including preparing a group official visa applications to visit Germany (Lower Saxony) next whole month, I can buy you beers if you are nearby.
I did mentioned "#torch._register_device_module('ocl','opencl') # as required by Pytorch 2.0 ?", but I was not able to figure out the most correct usage. It is always better to see the PyTorch community gave the correct answer.
Following this: https://dev-discuss.pytorch.org/t/find-package-torch-required-fails-2-3-1-and-nightly/2176/7
The module registration I got but from 2.3 there is a new interface that out of tree backend need to implement. I study it right now. To be on the safe side it is needed to stay on 1.13. Since in 2.2 there other issues that were fixed in 2.4 (like foreach operators not fallbacking to running one by one)
Once I implement the new interface and test it I hope pytorch OpenCL backend will work with 2.4 onwards.
I asked in the discussion to make sure such changes are published so backend developers can prepare in advance.
Now 2.4 works. Pytorch below 2.4 will fail due to lack of__foreach__
functions their support was fixed in 2.4
In 2.4 you need to call
torch.utils.rename_privateuse1_backend('ocl')
torch._register_device_module("ocl", object())
There are still more improvements needed buy 2.4 now works. All networks validated.
Closing
Thanks! Have arrived to Frankfurt the day before yesterday, Being visiting TU Clausthal for a month.
Best wishes, Jinchuan
On Tuesday 6 August 2024, Artyom Beilis @.***> wrote:
Now 2.4 works. Pytorch below 2.4 will fail due to lack of foreach functions their support was fixed in 2.4
In 2.4 you need to call
torch.utils.rename_privateuse1_backend('ocl') torch._register_device_module("ocl", object())
There are still more improvements needed buy 2.4 now works. All networks validated.
— Reply to this email directly, view it on GitHub https://github.com/artyom-beilis/pytorch_dlprim/issues/77#issuecomment-2271393086, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQBUVG4GFD5U3GK7LDROWLZQDKMBAVCNFSM6AAAAABJWGSTG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRGM4TGMBYGY . You are receiving this because you authored the thread.Message ID: @.***>
Ok documents are updated. Closing the issue
Hi Artyom, I tried with Apple Silicon M1, python 3.12, pytorch 2.3.1 with the following setting code:
Shall we /Could you please update the document for all platform ? I am happy to help. Meanwhile, it reports the following RuntimeError: Please register PrivateUse1HooksInterface by
RegisterPrivateUse1HooksInterface
first. I guess this is the new update in PyTorch 2.0?Finally, my first test case on my own simple code has the following bug, do we have any agenda to implement the following :
Best wishes, Jinchuan