Open mdhaber opened 6 months ago
Really happy to see the discussion get started. One point I'll add is that, of course, there are many other special functions that are widely used. Indeed, many of the ones important to me (like the Bessel functions) are not covered by the above list. Why? We started with a minimal set of functions that are easily implementable everywhere. It's not super helpful to propose a standard for a special function that other array libraries will not implement because it's too much effort.
This is why the task of converting SciPy's internal special function implementations into C++, see https://github.com/scipy/scipy/issues/19404, is relevant and important.
Thanks @mdhaber! Very nice write-up. Looking forward to the discussion.
Thanks for all the hard work on this @mdhaber, @izaid and @steppi! I'll add a few initial thoughts:
/
symbols, I assume by accident. All one-letter x
/y
/z
/n
should be positional-only. Other than that it seems fairly straightforward, aside from keyword-only a
/b
, that's too non-descriptive I think.axis
keywords in the reduction-like functions (logsumexp
, softmax
, log_softmax
); PyTorch requires the user to specify it, while JAX isn't consistent with SciPy/CuPy (-1
vs. None
). That will be a problem, since the semantics are going to be different between them in a way that's not easily resolvable with a deprecation.ndtr
renaming is 👍🏼, that name is too awful to standardize.expinti
/expintv
, those are the only ones that are pretty unreadable imho.gamma
: collapsing 3 functions into one seems to mean that the default gamma(x)
now needs to specify integration bounds? It's not entirely clear - the long explanation in point (5) in your write-up suggests that there may not be much gained from this, compared to staying with existing APIs.logsumexp
is pretty heavily used, and log_sum_exp
is following a consistent naming scheme but probably not actually more readable (functions with 2 underscores tend to be slightly awkward - goes for log_abs_xxx
too).The binomial coefficient function (
binom
) does not seem to be implemented for PyTorch, CuPy, or JAX arrays, but the need is so fundamental that we wish to include it in the standard.
Such "not implemented" status has typically been a blocker for inclusion. For the linalg
extension we also discussed a preliminary list, something like "if a library adds this, it must be with this signature and semantics". I'm not sure how fundamental binom
actually is for real-world applications; the feature request for PyTorch was approved 3 years ago for example (https://github.com/pytorch/pytorch/issues/47841), but no one even commented since.
Following the numbers used in https://github.com/data-apis/array-api/issues/725#issuecomment-1881598250:
polygamma
, log_multigamma
, and expintv
positional-only.<library>.special
. All libraries that we've studied except for PyTorch will need to add this namespace, so they get to start the interface from scratch. PyTorch would need to add an argument named axis
to their existing functions (since they currently use dim
, which has no default). Even if we choose to add axis
with a default, this would not necessarily break existing user code which already specifies dim
(which could take precedence over axis
). Does that work?expinti
/expintv
are not ideal names, but they are a little more explicit than expi
and expn
. Mathematically, these functions are represented by $Ei$ and $E_n$, so actually, I'm forgetting where expintv
came from (@steppi @izaid?). One idea inspired by Mathematica and R is to call them expint_ei
and expint_en
. Would that be better?gamma
collapsing several functions into one - no, gamma(z)
would compute the good old gamma function as usual. I don't think the long explanation in my write-up suggests that there is not much to be gained; rather, it suggests that there are still decisions to be made. There is much to be gained, especially the eventual ability to evaluate the gamma function integral between arbitrary lower and upper limits of integration rather than relying on subtraction, which is less readable and can cause catastrophic cancellation.logsumexp
from the perspective that it is no longer used as a description of what the function does (in which case it would follow the convention) but it is the name of the function. logsumexp
has become the name throughout the scientific Python ecosystem and even in other programming languages. (@steppi @izaid thoughts?)binom
in particular is important for real-world applications, comb
/binom
is used in scipy.interpolate
, scipy.linalg
, scipy.signal
, and scipy.stats
, for example. I am willing to admit that scipy.stats.ks2samp
(in which binom
is used) is not useful for real-world applications[^2], but the rest seem useful to me.[^1]: For arguments that are allowed to be specified by keyword, we probably need to double-check that the names (n
vs x
vs z
) are appropriate for the dtype we intend compliant implementations to accept. For positional-only arguments, libraries can name them in their documentation according to the input type they accept.
[^2]: Or it probably shouldn't be used given the alternatives available.
@mdhaber Covered it all super well, but I'll chip in just a little.
Yes, I also think the extension should be added. On the names, I think it's important to get this right and modernise names that no longer make sense (looking at you, ndtr
). For complex dtypes, I don't think we should leave this out.
For expinti
and expintv
, I think we were a little stuck. The current names are expi
and expn
, and expi
especially feels like it should exp(i x)
not some sort of integral. The issue is there is not a good abbreviation for "integral". If we use "it", we end up with expit
, which is already a function (logistic sigmoid). If we use "int", it sounds like something related to "integer". So, for these two, I think we don't know what to do and are open to suggestions. The "v" in expintv
is much more explainable: we generalised it to support floats and not just integers, an "v" is a usual marker for that.
As for logsumexp
versus log_sum_exp
, actually I'm keen to be consistent with the convention. In that particular case, I could live happily, but I would prefer to keep things similar. As for log_abs_xxx
, consider the current situation in SciPy where loggamma
is the logarithm of the gamma function and gammaln
is the logarithm of the absolute value of the gamma function. That really should be changed, it's confusing.
Generally very happy to discuss! Think it's important to get this right.
I would stay close for Digital Library of Mathematica Functions, https://dlmf.nist.gov/6.2, and perhaps name expinti
explcitly exp_integral_ei
, like in Wolfram Language: https://reference.wolfram.com/language/ref/ExpIntegralEi.html
Verbosity is not an issue nowadays with Copilot and IDEs.
Just curious, but would it make sense to copy or move logaddexp
, expm1
, and log1p
to the special function extension? While current numpy users might expect these in the main namespace, new users of numpy would probably find it quirky that these special functions have been "promoted" to the main namespace.
We shouldn't move them. That would be a compatibility break with existing versions of the standard. It wouldn't be a big deal to duplicate them. There's a similar thing for some functions like matmul
in the linalg extension.
It might also make sense to ask whether there might eventually be a "neural network" extension that reflects the functions in jax.nn
and torch.nn.functional
. Would any of the above functions be better suited in such an extension? (I personally don't think so.)
@NeilGirdhar Re: neural network extension. See https://github.com/data-apis/array-api/issues/158, which you previously commented on.
Great additions! My two cents about some of the points:
+1 on supporting both lower and upper bounds of integration. The parameter convention of Python’s range
function feels not the most straightforward, but I don’t have a better alternative either.
Are normcdf
and erf
linear transform of each other? If so, should we keep just one of them to keep the interface lean?
Where some implementation does not yet support complex argument of a function, does it make sense to standardize real argument first so that all implementations become compliant? Each implementation is then free to support complex argument.
Regarding the default value of axis
, what is the default of those in the Array API? Using the same seems natural. If some implementation has a different default, their users just have to reckon that it is different in the Array API.
Regarding the default value of axis, what is the default of those in the Array API? Using the same seems natural. If some implementation has a different default, their users just have to reckon that it is different in the Array API.
Functions like sum default to axis=None
, which means to reduce over the whole array.
This RFC proposes adding a special function extension to the array API specification.
Overview
Several array libraries have some support for "special" functions (e.g.
gamma
), that is, mathematical functions that are broadly applicable but not considered to be "elementary" (e.g.sin
). We[^1] propose adding aspecial
sub-namespace to the array API specification, which would contain a number of special functions that are already implemented by many array libraries.Prior Art
We begin with 25 particularly important special functions that are either already available for NumPy, PyTorch, CuPy, and JAX arrays or are easily implemented. Partial information about their signatures in these libraries is included in the table below; parameters that are less commonly supported/used are omitted.
With the exception of log-sum-exp functions, which reduces along an axis, all work elementwise, producing an output that is the broadcasted shape of the arguments. The variable names shown are not necessarily those used by the referenced library; instead they are standardized with
x
/z
/n
denoting an arguments of real/complex/integer dtype.Further information about these functions in other languages (C++, Julia, Mathematica, Matlab, and R) is available in this spreadsheet.
Proposal
The Array API specification would include the following functions in a
special
sub-namespace.log_sum_exp(z, /, *, axis=-1, weights=None)
logit(x, /)
expit(x, /)
log_normcdf(a, b=None, /)
normcdf(a, b=None, /)
normcdf_inv(p, /, *, a=None, b=None)
digamma(z, /)
polygamma(n, x)
log_multigamma(x, n)
log_abs_gamma(z, /, *, a=None, b=None, regularized=None)
gamma(z, /, *, a=None, b=None, regularized=None)
log_abs_beta(x1, x2, /, *, a=None, b=None)
beta(x1, x2, /, *, a=None, b=None)
erf(a, b=None, /)
erf_inv(p, /, *, a=None, b=None)
zeta(x1, x2=None, /)
binom(x1, x2, /)
expinti(x, /)
expintv(n, x)
softmax(z, /)
log_softmax(z, /)
A few notes about the interface:
ndtr
; we call itnormcdf
(as it is named in Matlab).loggamma
to compute the log of thegamma
function,betaln
to compute the log of thebeta
function, andlog_ndtr
to compute the log of thendtr
function. For consistency, we form the name of the log-version of a function by prependinglog_
to the original function name.erf
to evaluate a particular definite integral from-oo
tox
anderfc
to evaluate the integral fromy
to+oo
without the potential for catastrophic cancellation associated with1 - erf(x)
. A related, unmet need is the ability to evaluate such integrals fromx
toy
without subtraction, e.g.erf(y) - erf(x)
. To better meet this need - and to avoid the need for a separate "complementary" functions - we provide arguments that allow specification ofa
andb
limits of integration.ndtri
to compute the inverse ofndtr
anderfinv
to compute the inverse oferf
. For consistency, we form the name of the inverse of a function by appending_inv
to the name.gamma
for the (unregularized) gamma function andgammainc
for the regularized incomplete gamma function. In these cases, we have only one function with a keyword argument (e.g.regularized
). In some cases, this helps to reduce duplication of similar function names and signatures; in others, it allows developers to be more explicit about which variant is being used.Where applicable, we find that these conventions generalize well to other special functions that might be added in the future.
Other notes about function selection:
binom
) does not seem to be implemented for PyTorch, CuPy, or JAX arrays, but the need is so fundamental that we wish to include it in the standard. A moderately robust version of the function can be implemented in terms of the log of the gamma function until a more robust, custom implementation is available.Questions / Points of Discussion:
z
rather thanx
) even if some libraries are not compliant initially?log_
and_inv
components in the name, the order of operations is ambiguous. For example, wouldlog_normcdf_inv
(which would be useful in statistics) be the logarithm of the inverse ofnormcdf
or the inverse of the logarithm ofnormcdf
?normcdf
,normcdf_inv
,log_normcdf
, andlog_normcdf_inv
,normcdf
would have attributesnormcdf.log
andnormcdf.inv
, andnormcdf.log
would have an attributenormcdf.log.inv
.log_
andinv_
to both be prefixes. However,_inv
typically appears as a suffix in existing special function names, perhaps because the superscript $-1$ that denotes inversion often appears after the function symbol, e.g. $f^{-1}(x)$._log
and_inv
to be suffixes. However,log
typically appears as a prefix in existing function names, perhaps because this is how the function appears when typeset mathematically, e.g. $log(f(x))$.range
function: it is natural forrange(y)
to denote a range with an upper limit ofy
and forrange(x, y)
to generate a range betweenx
andy
. However, if the arguments were allowed to be specified as keywords, it would be unclear how they should be named. The userange(y)
suggests that the name of the first argument might bestop
, butrange(x, y)
suggests that the name of the first argument should bestart
; assigning either name and allowing both positional and keyword specification leads to confusion. To avoid this ambiguity,range
requires that the arguments be passed as positional-only. We run into a similar situation with oura
andb
arguments. After carefully considering many possibilities, we have suggested the following above:a
/b
require that these arguments are positional-only.a
/b
require that these arguments are keyword-only.a
/b
is that they are somewhat restrictive. Users cannot callnormcdf(a=x, b=y)
with keywords to be explicit, nor can they be callgamma(z, x, y)
without keywords to be concise. A compromise would be to accept separate positional-only and keyword-only versions of the same argument, and implement logic to resolve the intended use. While this is anticipated to allow for both natural and flexible use, it would be somewhat more cumbersome to document and implement.regularized
argument ofgamma
is challenging to choose.gamma(z, upper=y)
) will typically be regularized, suggesting that aregularized=True
default is more appropriate for this use case.gamma(z)
) is identically 1, suggesting thatregularized=False
is more appropriate for this use case.regularized=None
. Whengamma
is used as the complete gamma function (withouta/b
),regularized
would be set toFalse
, and whengamma
is used as the incomplete gamma function (witha/b
,regularized
would be set toTrue
. However, this is more complex to document than choosing eitherTrue
orFalse
as the default.binom
are not interchangeable, suggesting that some users might prefer to pass arguments by keyword. On one hand,n
andk
would be reasonable names, since the binomial coefficient is often needed in situations that call for "n choose k". On the other hand, the namesn
andk
are not entirely universal, and the function is extended for real arguments, whereas namesn
andk
are suggestive of integer dtypes. Also, whilea
andb
are concise names that are commonly used for lower and upper limits of integration, they are not as descriptive aslower
/upper
, and might be confused with the symbols commonly used for different arguments of the same function (e.g.beta
).low
/high
,lo/hi
,ll
/ul
,c
/d
have also been proposed.[^1]: @steppi, @izaid, @mdhaber, @rgommers