Closed oleksandr-pavlyk closed 1 month ago
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_355 ran successfully. Passed: 887 Failed: 18 Skipped: 91
53 tests affecting call of gemm
function are failed in dpnp
with that PR.
But I believe this might be due to some unproper internal logic in dpnp
.
The first issue I've found is incorrect result returned below:
import dpnp
from dpnp.dpnp_utils.dpnp_utils_linearalgebra import _define_contig_flag
a = dpnp.arange(3).reshape((1, 3, 1))
_define_contig_flag(a)
# Out: False
but it's expected to return True
, because a
is both C-contiguous or F-contiguous along last two dimensions.
It will result in unexpected extra memory allocation to copy array a
to temporary C-contiguous array.
The second one is somewhere in _gemm_batch
implementation:
import dpctl, dpctl.tensor as dpt
import dpnp.backend.extensions.blas._blas_impl as bi
a = dpt.reshape(dpt.arange(60, dtype='f4'), (5, 4, 3))
b = dpt.reshape(dpt.arange(3, dtype='f4'), (1, 3, 1))
b = dpt.copy(b)
c = dpt.zeros((5, 4, 1))
a.strides, b.strides, c.strides
# Out: ((12, 3, 1), (3, 1, 1), (4, 1, 1))
ev, _, _ = bi._gemm_batch(a.sycl_queue, a, b, c)
ev.wait()
c
# Out:
# usm_ndarray([[[ 5.],
# [14.],
# [23.],
# [32.]],
#
# [[ 0.],
# [ 0.],
# [ 0.],
# [ 0.]],
#
# [[ 0.],
# [ 0.],
# [ 0.],
# [ 0.]],
#
# [[ 0.],
# [ 0.],
# [ 0.],
# [ 0.]],
#
# [[ 0.],
# [ 0.],
# [ 0.],
# [ 0.]]], dtype=float32)
@vtavana , could you please look on that?
53 tests affecting call of
gemm
function are failed indpnp
with that PR. But I believe this might be due to some unproper internal logic indpnp
.
The necessary changes are implemented in dpnp-gh-1828. Relevant tests in dpnp
are now passed with both master branch of dpctl
and this branch.
The first issue I've found is incorrect result returned below:
import dpnp from dpnp.dpnp_utils.dpnp_utils_linearalgebra import _define_contig_flag a = dpnp.arange(3).reshape((1, 3, 1)) _define_contig_flag(a) # Out: False
but it's expected to return
True
, becausea
is both C-contiguous or F-contiguous along last two dimensions. It will result in unexpected extra memory allocation to copy arraya
to temporary C-contiguous array.
First case is working correctly in dpctl
In [1]: import dpctl.tensor as dpt, dpctl, numpy as np
In [2]: x = dpt.reshape(dpt.arange(10), (1, 10, 1))
In [3]: x.flags.contiguous
Out[3]: True
In [4]: x.flags
Out[4]:
C_CONTIGUOUS : True
F_CONTIGUOUS : True
WRITABLE : True
@vtavana does dpnp-gh-1828 address this case too?
First case is working correctly in dpctl
In [1]: import dpctl.tensor as dpt, dpctl, numpy as np In [2]: x = dpt.reshape(dpt.arange(10), (1, 10, 1)) In [3]: x.flags.contiguous Out[3]: True In [4]: x.flags Out[4]: C_CONTIGUOUS : True F_CONTIGUOUS : True WRITABLE : True
@vtavana does dpnp-gh-1828 address this case too?
Yes, both examples provided by @antonwolfy also work fine in dpnp-gh-1828.
As a side note, _define_contig_flag
from dpnp is used in batch calculation. And the goal is to check if each 2D array that forms the N-D array is f-contiguous or c-contiguous. So, we do not use built-in flag
there.
Closes gh-1664.
Improves performance of
dpt.reshape(X, new_shape, order="F")
when copy is needed.