Closed groutr closed 2 years ago
@groutr, thanks for the contribution!
I am not sure the _array_attr
function is necessary. np.asarray
does not copy if it receives an input np.array
. I think it would be more clear to just cast to np.asarray
and check for the property in places you are proposing to call this method.
@aaraney _array_attr
is there because these validation functions are most likely to receive pandas objects, not ndarrays. We are hoping to take advantage of pandas exposing the array properties without the unnecessary cast to ndarray. With _array_attr
, checking the attribute is an order of magnitude faster.
# aa = pd.Series(range(1000))
In [93]: %timeit _array_attr(aa, "ndim").ndim
425 ns ± 41.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [94]: %timeit np.asarray(aa).ndim
4.45 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Ah, I see. That makes a lot of sense. Thanks for clarifying and providing the above example, @groutr.
My only request would be adding some type hints to the function declaration. The return type hint might be difficult to nail down for all cases, so I think the straight path of npt.ArrayLike
should be fine unless you have a better suggestion.
Thanks again, @groutr!
Some minor improvements to _validation.py that center around the usage of
np.asarray
. Usingnp.asarray
overnp.array
can avoid unnecessary copies (ie, when the input is already an array).When comparing the shapes, we don't need to hold all the arrays in memory.