dpt.pow() with dtype=`c16` and scalar on gpu/cpu returns different result

vlad-perevezentsev commented 1 year ago

The below example works differently depending on the device:

import dpctl.tensor as dpt

a = dpt.asarray([0], dtype='c16', device='gpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j])

a = dpt.asarray([0], dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([nan+nanj])

with dtype = 'c8' returns the same result for different devices

import dpctl.tensor as dpt

a = dpt.asarray([0], dtype='c8',device='gpu')
dpt.pow(a ,1)
# usm_ndarray([0.+0.j], dtype=complex64)

a = dpt.asarray([0], dtype='c8',device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j], dtype=complex64)

I also noticed that dpt.pow works correctly when the input array size is between 2 and 7 for dtype c16.

import dpctl.tensor as dpt

a = dpt.zeros((2,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j, 0.+0.j])

a = dpt.zeros((7,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j])

a = dpt.zeros((8,), dtype='c16', device='cpu')
dpt.pow(a,1)
# usm_ndarray([ 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j, nan+nanj, nan+nanj,
             nan+nanj, nan+nanj])

Besides this there is an interesting case when x2 (scalar) is numpy dtype Then dpt.pow with input array with data type c8 returns nans too

import dpctl.tensor as dpt
import numpy

a = dpt.zeros((8,), dtype='c8', device='cpu')
dpt.pow(a, 1)
# usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j], dtype=complex64)

dpt.pow(a, numpy.int32(1))
# usm_ndarray([ 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j, nan+nanj, nan+nanj,
             nan+nanj, nan+nanj])

ndgrigorian commented 11 months ago

@vlad-perevezentsev These discrepancies seem to have been resolved recently.

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: a = dpt.asarray([0], dtype='c16', device='cpu')

In [3]: dpt.pow(a,1)
Out[3]: usm_ndarray([0.+0.j])

In [4]: a = dpt.zeros((2,), dtype='c16', device='cpu')

In [5]: dpt.pow(a,1)
Out[5]: usm_ndarray([0.+0.j, 0.+0.j])

In [6]: a = dpt.zeros((8,), dtype='c16', device='cpu')

In [7]: dpt.pow(a,1)
Out[7]:
usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j])

For the Numpy dtype case:

In [1]: import dpctl.tensor as dpt, numpy as np

In [2]: a = dpt.zeros((8,), dtype='c8', device='cpu')

In [3]: dpt.pow(a, np.int32(1))
Out[3]:
usm_ndarray([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
             0.+0.j])

It's hard to know if it was a result of #1411 or the change in compiler version.

Either way, if you can confirm that these issues are resolved for you as well, we can consider this issue resolved.

oleksandr-pavlyk commented 9 months ago

@vlad-perevezentsev I think this issue is ready to be resolved

IntelPython / dpctl

dpt.pow() with dtype=`c16` and scalar on gpu/cpu returns different result #1378