apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.4k stars 3.5k forks source link

[Python] Allow `rename_columns` to take a mapping #40644

Closed judahrand closed 5 months ago

judahrand commented 6 months ago

Describe the enhancement requested

Pandas allows a mapping to be passed like df.rename(columns={'foo': 'bar'}). This is often very useful when you only want to rename a subset of the columns. Currently, rename_columns only accepts a list of all column names.

I propose that rename_columns be extended to accept a Mapping[str, str]. All columns with a name matching a key of the mapping will be renamed to the relevant value. If any key does not correspond to at least one column then a KeyError should be raised.

Example:

>>> import pyarrow as pa
>>> import pandas as pd
>>> df = pd.DataFrame({'n_legs': [2, 4, 5, 100],
...                    'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
>>> batch = pa.RecordBatch.from_pandas(df)
>>> new_names = {"n_legs": "n", "animals": "name"}
>>> batch.rename_columns(new_names)
pyarrow.RecordBatch
 n: int64
 name: string
 ----
 n: [2,4,5,100]
 name: ["Flamingo","Horse","Brittle stars","Centipede"]

Component(s)

Python

AlenkaF commented 5 months ago

Issue resolved by pull request 40645 https://github.com/apache/arrow/pull/40645