Closed MarcoGorelli closed 8 months ago
But then how do you use the methods we have that return indices? (sorted_indices
, unique_indices
)
Exactly, you don't
Unless we accept some level of redesign, starting with https://github.com/data-apis/dataframe-api/issues/346
A DataFrame can have an arbitrary or can have an undefined order, but that doesn't mean it has to be. If it has a defined order or an arbitrary order, i.e. someone ran a sort
operation against it, or the operations run thus far are defined to be order maintaining, then take
is well defined. If someone ran something that makes no ordering guarantees then the order could be undefined, in which calling take
against it should be able to return an undefined order as well.
The only situations where take
is arguably undefined is when the input order is undefined, where that feels like perfectly reasonable behavior to me.
Let's continue discussion in #346 regarding Expressions, but I don't think take
is a problematic operation.
How does a user know if a dataframe has input order defined or not?
How does a user know if a dataframe has input order defined or not?
Some examples:
I think we could also generally specify that operations maintain the input order of the DataFrame unless otherwise noted. I believe we've made sure to add that into the docstring where appropriate, i.e. things like joins, groupbys, getting unique values, etc. are documented to not guarantee a specific output order.
@kkraus14 https://github.com/data-apis/dataframe-api/issues/344#issuecomment-1906851412
If we only have one
DataFrame
class, and its order is undefined, thenDataFrame.take
isn't a well-defined operationAlternatives
Accept some level of re-design, even if it means extra work. But with the current design,
DataFrame.take
is undefined, so I suggest we remove it first