d6t / d6tstack

Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet
MIT License
195 stars 45 forks source link

postgres and Primary Key #32

Closed marksparrish closed 2 years ago

marksparrish commented 2 years ago

I have searched but no luck. I maybe missing the right terms. I have a pre-existing table with a primary key.

class District(Base):
    __tablename__ = 'districts'

    id = Column(Integer, primary_key=True, autoincrement=True)

    district_county = Column(String(30))
    district_type = Column(String(100))
    district_name = Column(String(100))

    file_date = Column(Date)

I process a bunch of csv files with pandas and then pass it to a function

    def _write_df(self, df, engine, table_name) -> None:
        # df.to_sql(table_name, engine, if_exists='append', index=False, chunksize=100000)
        # add null id column
        # df.insert(0,'id','')
        d6tstack.utils.pd_to_psql(df, str(engine.url), table_name, if_exists='append', sep='\t')
    return None

But I get this error,

    File "/usr/local/lib/python3.8/site-packages/d6tstack/utils.py", line 107, in pd_to_psql
    cursor.copy_from(fbuf, table_name, sep=sep, null='')
    psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "BAKER"
    CONTEXT:  COPY districts, line 1, column id: "BAKER"

I tried adding an 'id' column with df.insert(0,'id','') but then I get this error.

psycopg2.errors.NotNullViolation: null value in column "id" of relation "districts" violates not-null constraint

What am I doing wrong? In using pandas.to_sql, I don't need an 'id' column and I use index=false in the call.

marksparrish commented 2 years ago

Found the solution - added reset_index() to the df

d6tstack.utils.pd_to_psql(df.reset_index(), str(engine.url), table_name, if_exists='append',sep='\t')
marksparrish commented 2 years ago

One more thing - I did not have the order correct in my pandas df. Make sure you order your columns correctly or you will get strange results.