data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
212 stars 44 forks source link

Add `__complex__` to array object #486

Closed honno closed 1 year ago

honno commented 1 year ago

We currently have respective dunder methods for bool/ints/floats, so if just for consistency x.__complex__() seems nice. Personally I've found the other methods hella useful for array-api-tests and Hypothesis, so imagine this will prove useful too.

I think we'd want to specify this just for complex-valued floating-point arrays, and adopters can choose to support real-valud float arrays if-and-however they so wish.

oleksandr-pavlyk commented 1 year ago

It should be specified for all types where casting to complexes is defined:

In [4]: complex(np.array(2, dtype='int32'))
Out[4]: (2+0j)

In [5]: complex(np.array(2, dtype='single'))
Out[5]: (2+0j)
honno commented 1 year ago

It should be specified for all types where casting to complexes is defined

Yeah that makes sense, my only concern is it might end up looking slightly inconsistent. Currently:

  1. the proposed complex-y function real() in #427 only takes complex numbers
  2. other dunder type array methods are only specified to work on their respective data-types
leofang commented 1 year ago

I think the story for __complex__ is different. In other places, we strive to avoid complex-to-real conversions, but here with __complex__ the direction is different (whatever-to-complex), so I am less concerned.

asmeurer commented 1 year ago

Right now float() and int() are only defined for floating-point and integer dtype arrays (they say "must" but it probably should be "should"). Maybe complex() is a little different though since at least for float the conversion would be lossless.

jakirkham commented 1 year ago

It is worth noting that all of these methods coerce an Array to a Python object, which could mean there is an expensive conversion step (for example synchronization, blocking computation, etc.) involved. If we are including these methods, maybe we should caution users that they may be slow.

leofang commented 1 year ago

As @honno noted we do already have __int__ etc, so yes we need to make it clear that it could be expansive in some libraries, but we should add it just for completeness.

rgommers commented 1 year ago

+1 for adding __complex__ - it's simply for design consistency.

In other places, we strive to avoid complex-to-real conversions, but here with __complex__ the direction is different (whatever-to-complex), so I am less concerned.

This we don't have to decide here. The spec should simply say that it must work for a 0-D complex array, and follow the type promotion rules for other dtypes. Whether real is going to be required to work is discussed in gh-478, so I suggest not to have that discussion here.

seberg commented 1 year ago

IMO it must work for all 0-D inputs covered by the standard (numerical and boolean). Two reasons:

  1. That is how Python does it.
  2. This would be related to casting and thus: real_arr.astype(complex_dtype). That I think must work? (This is not promotion: there are no mixed dtypes involved.)

I could see a point against boolean, but it seems no worse than for int/float, so unless that is exclused there, it should not be here I think.

seberg commented 1 year ago

OK, I guess the standard bails on defining float(int_arr) currently, so I have no opinion on what is done here (but I do have the opinion that it shouldn't bail on that conversion).

rgommers commented 1 year ago

Indeed, it should be consistent with float(int_arr) or have some design rule to point to. Whether float(int_arr) is working as desired I'm not quite sure.

  1. This would be related to casting and thus: real_arr.astype(complex_dtype).

That's another reasonable choice I'd say. I think I agree that this is an improvement over what the spec currently says. It does mean that x = array(1.5); int(x) should work as well (and return 1). Probably okay indeed, since the user explicitly asks for an int. I checked TensorFlow, and even there this works.

jakirkham commented 1 year ago

Why does it need to be limited to 0-D? We could allow this for any array of size=1 independent of rank. In fact some libraries already do this now.

asmeurer commented 1 year ago

NumPy is planning to deprecate scalar conversions for higher rank arrays. See https://github.com/numpy/numpy/issues/10404 for a discussion on why it's problematic.

honno commented 1 year ago

So I wrote #497 to add what I think is obvious and uncontroversial:

  1. __complex__() for complex arrays.
  2. unrestrict casting with a different kind for all respective casting methods, e.g. using __complex__() for an integer array. This means an adopter may implement different-kind casting, whereas before the language restricted casting methods to only arrays of the same kind.

I don't require support for cross-kind casting for any of these casting methods in the PR, although ready to follow-up. It's just a bit annoying to document as there are numerous casting scenarios that need noting (i.e. note allowing bool casting non-bools and require raises for casting unpresentable values). I think it's best to just get the easy stuff in first, so it's simpler to deliberate on awkward things afterwards.

honno commented 1 year ago

On the 20th Oct meeting, no one opposed to going beyond unrestricted casting and explicitly support (most) cross-kind casting, emulating how Python's own builtins' cast to different kinds just like NumPy already does. So I'll try summarizing the full extent of Python's cross-kind casting behaviour (I'll write a dirty script to verify edge cases), and update #497 accordingly.

Interestingly Athan made the point that the philosophy at least early on is to avoid undefined behaviour and then expand as-and-when necessary, relevant here by usage of "Must" as opposed to "Should" for same-kind inputs.