MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
7 stars 0 forks source link

A failed typecasting prevents CTA bus stop data standardization and cleaning #188

Closed MattTriano closed 10 months ago

MattTriano commented 10 months ago

The PK column (systemstop) should be of type smallint, but it looks like it's being ingested with a .0 tacked on (presumably by pandas/geopandas). None of the records have a null value in that column, so I'm not sure why pandas/geopandas would treat these as decimal values, but it is.

Anyway, the fix is pretty simple. Modify the relevant {dataset_name}_standardized.sql just split it at the decimal and cast the first element of the resulting array to smallint

split_part(systemstop, '.', 1)::smallint 

then run `dbt from that pipeline stage onward.