MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
9 stars 0 forks source link

dbt passes through a pg error when trying to read a column with a hyphen in the name #99

Closed MattTriano closed 1 year ago

MattTriano commented 1 year ago

The issue appeared when a new pipeline attempted to load a table from temp_ into its data_raw version, and the complaint was about a column named "ease-ment" at the source.

INFO - [0m16:15:00  Database Error in model nyc_parcel_sales (models/staging/nyc_parcel_sales.sql)
INFO - [0m16:15:00    column "ease" does not exist
INFO - [0m16:15:00    LINE 22: ...ing_class_category,tax_class_at_present,block,lot,ease-ment,...
INFO - [0m16:15:00                                                                  ^
INFO - [0m16:15:00    compiled Code at target/run/re_dbt/models/staging/nyc_parcel_sales.sql
...

I assume postgres complained because a hyphen is indistinguishable from a minus sign under my current workflow. Keeping the hyphen would involve adding double-quotes around the column name whenever referencing it to postgres, and that's just not viable.

I already enforce some rules about column names here (which is only called here so far), so it might just be a matter of integrating a step to strip out hyphens (or replace them with underscores) in that step.

Or even better, make another function for executing this hyphen-scrubbing step and then encapsulate all column-name cleaning steps into a function standardize_column_names() to avoid needing to modify things downstream as new invalid col-name cases emerge.