Point72 / csp

csp is a high performance reactive stream processing library, written in C++ and Python
https://github.com/Point72/csp/wiki
Apache License 2.0
166 stars 28 forks source link

Remove Numpy1DArray and NumpyNDArray type distinction #42

Open AdamGlustein opened 6 months ago

AdamGlustein commented 6 months ago

Is your feature request related to a problem? Please describe. The csp.typing Numpy1DArray and NumpyNDArray types are confusing for users and lead to a lot of unexpected typing issues. We should be consistent with Numpy and use the base np.ndarray type for all Numpy time-series objects. With the distinction, you get strange bugs like this:

        inp_arr = np.zeros(shape=(4,))
        @csp.graph
        def arr_1d(x: csp.ts[csp.typing.Numpy1DArray[float]]) -> csp.ts[np.ndarray]:
            return x
        res = csp.run(arr_1d(arr_1d(csp.const(inp_arr))), starttime=st, endtime=datetime(2020, 2, 7, 9, 1)) # should not raise

This raises an error: csp.impl.types.instantiation_type_resolver.TSArgTypeMismatchError: In function accept_arr_1d: Expected ts[csp.typing.Numpy1DArray[float]] for argument 'x', got ts[csp.typing.NumpyNDArray[float]]

Describe the solution you'd like For now, set the Numpy1DArray type to be equal to the NDArray type and make associated changes to the parquet reader/writer such that users need to opt-in to using shape information. By default we will treat the Numpy array as 1-dimensional.

Describe alternatives you've considered Keeping the two types as separate will continue to lead to weird bugs as I showed above.

Additional context This change has come up as a discussion topic before as the type distinction led to bugs in the past.

ptomecek commented 5 months ago

I also wonder if we shouldn't adopt or take inspiration from some of the other libraries out there that define numpy types. In particular: