d6t / d6tflow

Python library for building highly effective data science workflows
https://d6tflow.readthedocs.io/en/latest/
MIT License
951 stars 77 forks source link

pyarrow ValueError: Duplicate column names found #28

Closed citynorman closed 3 years ago

citynorman commented 3 years ago

pyarrow can't save duplicate columns

d6tdev commented 3 years ago

To find out which column:

df.columns[df.columns.duplicated()]
# or
from collections import Counter
Counter(df.columns).most_common()[:5]

To fix it: