data-apis / dataframe-api

RFC document, tooling and other content related to the dataframe API standard
https://data-apis.org/dataframe-api/draft/index.html
MIT License
102 stars 20 forks source link

rename get_rows to take #339

Closed MarcoGorelli closed 11 months ago

MarcoGorelli commented 11 months ago

In pandas, there's .iloc, but the consortium doesn't want that

Polars has .gather

pytorch also uses gather for this kind of operation:

>>> t = torch.tensor([1, 2, 3, 4])
>>> torch.gather(t, 0, torch.tensor([3, 2]))
tensor([4, 3])

tensorflow has gather, which does the same kind of thing https://www.tensorflow.org/api_docs/python/tf/gather

I've not seen get_rows anywhere, it's definitely not a common name. Personally I'd rather go with common existing names than create new ones

I promise I'll stop bikeshedding after February (when we aim to make the first non-beta release), trying to get it all out before then

jorisvandenbossche commented 11 months ago

FWIW, not being familiar with a gather method before, I don't find that name necessarily obvious (I would rather guess it does something like "collect"). If you want a name that is already used by other libraries, take is another obvious candidate, which is being used by pandas, numpy and also the array api (I am not sure if I would find take necessarily a more obvious name if I would not be familiar with it already (hard to say because I am)).

MarcoGorelli commented 11 months ago

thanks for your inputs - I'm OK with either, I think they're both improvements over get_rows

MarcoGorelli commented 11 months ago

looks like R has gather, but it does something completely different http://statseducation.com/Introduction-to-R/modules/tidy%20data/gather/

maybe take is better then

MarcoGorelli commented 11 months ago

@kkraus14 you OK with take too?

MarcoGorelli commented 11 months ago

thanks all, let's get this in then