NOAA-OWP / hydrotools

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Other
53 stars 12 forks source link

Improvements to validation checks #189

Closed groutr closed 2 years ago

groutr commented 2 years ago

Some minor improvements to _validation.py that center around the usage of np.asarray. Using np.asarray over np.array can avoid unnecessary copies (ie, when the input is already an array).

When comparing the shapes, we don't need to hold all the arrays in memory.

aaraney commented 2 years ago

@groutr, thanks for the contribution!

I am not sure the _array_attr function is necessary. np.asarray does not copy if it receives an input np.array. I think it would be more clear to just cast to np.asarray and check for the property in places you are proposing to call this method.

groutr commented 2 years ago

@aaraney _array_attr is there because these validation functions are most likely to receive pandas objects, not ndarrays. We are hoping to take advantage of pandas exposing the array properties without the unnecessary cast to ndarray. With _array_attr, checking the attribute is an order of magnitude faster.

# aa = pd.Series(range(1000))
In [93]: %timeit _array_attr(aa, "ndim").ndim
425 ns ± 41.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [94]: %timeit np.asarray(aa).ndim
4.45 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
aaraney commented 2 years ago

Ah, I see. That makes a lot of sense. Thanks for clarifying and providing the above example, @groutr.

My only request would be adding some type hints to the function declaration. The return type hint might be difficult to nail down for all cases, so I think the straight path of npt.ArrayLike should be fine unless you have a better suggestion.

aaraney commented 2 years ago

Thanks again, @groutr!