Closed MarcoGorelli closed 1 year ago
I'd expect the Column to by library-specific, so the extraction would be library-specific too
Current Column
is part of the standard though:
https://github.com/data-apis/dataframe-api/blob/main/spec/API_specification/column_object.rst
It's also part of several Standards' signatures
If we don't agree on what's in a column, or on how to use it, then I don't see how one can hope to write
mask = ... # create a mask
df_standard.get_rows_by_mask(mask)
in a portable manner
Poor wording on my part. I mean the implementation of Column is library-specific, so i expect the implementation of get_rows_by_mask to also be library-specific.
Right, so say we have
class PandasColumn:
def __init__(self, column):
self._series = column
and
class PandasDataFrame:
def get_column_by_name(self, label):
if not isinstance(label, str):
raise TypeError()
return PandasColumn(self.dataframe.loc[:, label])
and then
df # pandas dataframe
df_std = df.__consortium_dataframe__()
mask = df_std.get_column_by_name('a') > 1
df_std.get_rows_by_mask(mask)
How would we implement get_rows_by_mask
? Would we just use private method, like ._series
in the above example, which we know exists for PandasColumn
by might not exist in general?
class PandasDataFrame:
def get_rows_by_mask(self, mask):
return PandasDataFrame(self.dataframe.loc[mask._series])
If so, OK, this sounds fine to me
Is the concern about getting e.g. a VaexColumn instead of a PandasColumn? id be fine ruling out mixing-and-matching like that
This seems addressed, the last implementation in @MarcoGorelli's comment above works, and I agree with @jbrockmendel's answer. Can we close this?
yup! thanks
The current signature is
Let's take pandas as an example: I have a standard DataFrame, which is backed by
df = pd.DataFrame{'a': [1,2,3]})
. I'd like to get rows by the mask[True, False, True]
, which is inside aColumn
mask
.How do I get the values out of
mask
such that I can pass them to the underlying pandas DataFrame and calldf.loc[mask]
?dlpack
was mentioned in the last call, although that doesn't support booleans. So, even if there was aColumn.__dlpack__
method, then something like this wouldn't work:as the first line would error with