data-apis / dataframe-api

RFC document, tooling and other content related to the dataframe API standard
https://data-apis.org/dataframe-api/draft/index.html
MIT License
103 stars 20 forks source link

Avoiding the "pandas trap" #4

Open TomAugspurger opened 4 years ago

TomAugspurger commented 4 years ago

Split from the discussions in https://github.com/pydata-apis/dataframe-api/issues/2.

To avoid the trap of "let's just match pandas", let's collect a list of specific problems with the pandas API, which we'll intentionally deviate from. To the extent possible we should limit this discussoin to issues with the API, rather than implementation.


devin-petersohn commented 4 years ago

To avoid the trap of "let's just match pandas", let's collect a list of specific problems with the pandas API, which we'll intentionally deviate from.

I think there are multiple traps here, for example specifically deviating from or removing entire semantics (not APIs) from pandas.As you may guess I am all for removing duplicated or unused APIs.

I know you are not talking about Modin (or cuDF) specifically with this trap comment, but I want to address this discussion because there are some obvious points of disagreement that some have made in the past. In Modin, we have specifically chosen to be drop-in compatible with the pandas API. The main goal here is to fix the pandas API by slowly moving people away from it, and users have overwhelmingly agreed with this stance.

The argument and disagreement here will no doubt be "should we tell users what they need or should users tell us what they need?". I believe that users do know what they need to do, even if they cannot always accurately describe it in words.

@TomAugspurger are we editing your first comment? I see @wesm did this and I have a few APIs to add to the problem list.

wesm commented 4 years ago

I added bylines to the list, so go right ahead