Closed lucascolley closed 2 weeks ago
cov
This PR implements a small subset of numpy.cov
, namely cov(m)
, to replace a private helper in SciPy.
cov
(unbiased estimate)correction
argument like PyTorch and other statistical functions in the standard. Need for that hasn't arisen yet though, so that isn't attempted here.Good to proceed? EDIT: edited to make mean
private.
Why add mean
? It seems from https://github.com/data-apis/array-api/issues/846 that it's only an oversight that complex support wasn't added, and that can be done. I would just finish that discussion, and then likely nothing else has to change (except perhaps deleting a check in array-api-compat
). Moving usages of mean
over to this package and then back a little later sound like quite a bit of churn for no real reason.
We could keep complex mean
private here until there is a reported use case for it, and remove it once it is supported in array-api-compat - I think that sounds like a better idea than exposing it just because we can.
That sounds good to me.
and then likely nothing else has to change (except perhaps deleting a check in array-api-compat)
I guess you mean array-api-strict
. Complex mean
already works with the upstream libraries (at least NumPy and PyTorch, I didn't check the others), and will work with array-api-compat
.
Also, since array-api-strict now supports multiple versions, we can easily add a preliminary 2024.12
version and disable the complex check in mean
for that version, even before that version of the standard is released.
NumPy, JAX and PyTorch all agree on the same default behaviour for single-argument cov (unbiased estimate)
FWIW, TensorFlow disagrees here.
This aside, it is somewhat unfortunate that the default behavior in NumPy for cov
(bias = False
) diverges from the default behavior of var
and std
. The use of bias
appears to be legacy, being subsequently supplanted by ddof
. If we were designing from first principles and considering adding to the standard, I'd argue for being consistent with existing precedent in the standard, rather than having diverging conventions, especially as, while N - 1
is common, it is by no means the universally appropriate normalization factor, and my inclination would be to force users to be explicit about the normalization factor they believe suitable.
If we were designing from first principles and considering adding to the standard, I'd argue for being consistent with existing precedent in the standard
But we aren't here, so I think matching what NumPy, JAX and PyTorch was fine for this PR. Making changes just makes it harder to use, so let's not cross that bridge until we have to for standardization (which may well be never).
But we aren't here... Making changes just makes it harder to use
I'm not sure that is the case. TMK, Array API extras is not restricted to being a compat layer, and as an independent library, could be free to make design choices which (mildly in this case) diverge from existing libraries such as NumPy which carry historical context.
The notion of "harder to use" is somewhat in the eye of the beholder here. Yes, following existing NumPy practice (which is based on a bias
boolean kwarg which would be highly unlikely to be standardized given standardized var
and std
convention) would mean that downstream libraries such as SciPy could simply swap np.cov
for xpx.cov
, but this also means persisting the additional cognitive burden among future code readers in recalling that default conventions differ between the univariate and multivariate cases. Personally, I place greater value on the latter than the former, and more so in this case as Array API extras has some freedom to evolve the story a bit.
but this also means persisting the additional cognitive burden among future code readers in recalling that default conventions differ between the univariate and multivariate cases
I don't think this is an extra burden, I'm reasonably confident that matching cov
in other libraries is more important and less confusing on balance than matching std
/var
would be. You'd be right for a reader that knows the math, doesn't know or use any other array library, and comes at this fresh using only the standard and array-api-extra
- but that will be rare.
This would replace a private utility in SciPy.