IntelPython / dpnp

Data Parallel Extension for NumPy
BSD 2-Clause "Simplified" License
94 stars 21 forks source link

resolve gh-1871 #1872

Closed vtavana closed 1 month ago

vtavana commented 1 month ago

This PR resolved issue #1871.

It also improves the performance for some special cases.

>>> import  dpnp
>>> size = 4096
>>> device="gpu"
>>> a = dpnp.ones((size, size), order="F", device=device)
>>> b = dpnp.ones((size, size), order="F", device=device)
>>> %timeit dpnp.matmul(a, b) 

New implementation
Iris Xe: 142 ms ± 6.03 ms 
Intel Core: 1.81 s ± 383 ms 

Old dpnp
Iris Xe: 156 ms ± 3.38 ms 
Intel Core: 2.07 s ± 69.2 ms 
github-actions[bot] commented 1 month ago

View rendered docs @ https://intelpython.github.io/dpnp/index.html