data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
204 stars 42 forks source link

Consider adding the one-hot array creation function #787

Open NeilGirdhar opened 2 months ago

NeilGirdhar commented 2 months ago

One-hot is a very common array creation function in machine learning. It might be worth considering its addition.

Various implementations have different semantics:

From an organization standpoint, I think it probably would belong alongside other creation functions like eye.

Alternatively, one-hot could be generalized to a broadcastable unit-impulse, which is already in scipy. Whereas one-hot chooses one element in a vector to be on, unit-impulse chooses one element in an array to be on.

One-hot is a generalization of the standard (elementary) basis vector that is sometimes requested—a generalization because it supports broadcasting.

kgryte commented 2 months ago

Thanks, @NeilGirdhar, for opening this issue. This might be another candidate for a "deep learning" extension (ref: https://github.com/data-apis/array-api/issues/158). Both PyTorch and Jax, which have similar APIs, place it in their nn namespace. While one-hot creation is certainly common, we'd probably need to see a bit more ecosystem uptake and alignment (e.g., CuPy, Dask, NumPy) before consideration.