deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Use copy CSV instead of insert for pandas to postgres #297

Closed bendnorman closed 11 months ago

bendnorman commented 1 year ago

This PR:

On my comp make etl_local went from 6m39s -> 2m58s.

TrentonBush commented 12 months ago

Does this work if the df columns are in a different order than the metadata definition? Just noticed the csv is written separately from the COPY statement that defines the headers

bendnorman commented 11 months ago

Good question! I tested out mixing up the order of tables, and df.to_sql() still inserted the data with the correct column order.