bodo-ai / Bodo-Pandas-Collaboration

Shared repo used to track Pandas issues noted by Bodo.
0 stars 0 forks source link

Remove the `errors='ignore'` argument to ensure code can be jittable #8

Open ehariri opened 2 years ago

ehariri commented 2 years ago

For several APIs, the errors argument can be used to control what happens when an operation is not successful. For example,pandas.to_datetime allows errors for handling a failed conversion: pandas.to_datetime — pandas 1.4.2 documentation . While both raise and coerce are JIT compatible, the ignore option is not because it can lead to inconsistent types.

>>> arr = ["2/21/2022", "123"]
>>> pd.to_datetime(arr, errors="ignore")
Index(['2/21/2022', '123'], dtype='object')

>>> arr = ["2/21/2022", "1/1/2011"]
>>> pd.to_datetime(arr, errors="ignore")
DatetimeIndex(['2022-02-21', '2011-01-01'], dtype='datetime64[ns]', freq=None)

Here as you can see, the output dtype is dependent entirely the runtime values of arr, meaning for arguments with particular input types you cannot predict the output type. We would like to see this option removed from all Pandas APIs because it fundamentally makes it impossible to JIT compile this code.