data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
204 stars 42 forks source link

RFC: add `count_nonzero` for counting the number of "non-zero" values #794

Open kgryte opened 2 months ago

kgryte commented 2 months ago

This RFC proposes a new addition to the array API specification for counting the number of "non-zero" (i.e., truthy) values in an array.

Overview

Based on array comparison data, the API is available across all major array libraries in the PyData ecosystem.

count_nonzero was originally identified in https://github.com/data-apis/array-api/issues/187 as a potential standardization candidate and has usage within downstream libraries (e.g., sklearn, SciPy).

Prior art

Proposal

def count_nonzero(x: array, /, *, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> array

Questions

rgommers commented 2 months ago

@asmeurer can you tell us if it's easy to work around a missing keepdims keyword in array-api-compat?

rgommers commented 2 months ago

One other nice thing is that unlike nonzero, this function does not have a data-dependent output shape. So aside from performance, it can be supported by implementations that may not support nonzero.

rgommers commented 2 months ago

The keepdims argument was added fairly late (2020) in numpy: https://github.com/numpy/numpy/pull/15870. So it may have simply been overlooked by other libraries. Probably just a low-prio feature (also no usages in scipy at all).

asmeurer commented 2 months ago

I think so. Isn't it just a matter of calling expand_dims? Maybe https://github.com/data-apis/array-api/issues/760 would help.

asmeurer commented 2 months ago

To reiterate what I said at the meeting today, count_nonzero is nice because the standard doesn't support calling sum() on a boolean array, so count_nonzero is the idiomatic way to get the number of True elements in a bool array.

rgommers commented 2 months ago

Thanks! SGTM then to add count_nonzero. And add keepdims for design consistency with other reductions.

kgryte commented 2 months ago

PR is up: https://github.com/data-apis/array-api/pull/803