MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
9 stars 0 forks source link

Remove the `report` schema and its models #122

Closed MattTriano closed 1 year ago

MattTriano commented 1 year ago

This adds some clutter to the repo and I'd rather just generate reports via superset dashboards (which can be periodically updated and emailed or sent to slack per these instructions).

MattTriano commented 1 year ago

This isn't a problem caused by the report schema, but the long names of resources in the report schema caused this issue to emerge. I included a heavily truncated subset of the error message below.

...
START sql table model report.chicago_residential_parcel_sales_per_quarter_and_cpd_beat  [RUN]
START sql table model report.chicago_residential_parcel_sales_per_quarter_and_cpd_district  [RUN]
ERROR creating sql table model report.chicago_residential_parcel_sales_per_quarter_and_cpd_district  [ERROR [0m in 7.44s]]
OK created sql table model report.chicago_residential_parcel_sales_per_quarter_and_cpd_beat  [SELECT [0m in 7.54s]]

Finished running 2 view models, 21 table models in 0 hours 18 minutes and 45.97 seconds (1125.97s).
Completed with 1 error and 0 warnings:
Internal Error
Cache inconsistency detected: in rename, new key _ReferenceKey(database='re_dwh', schema='report', identifier='chicago_residential_parcel_sales_per_quarter_and_cp__dbt_backup') already in cache: [_ReferenceKeyschema='report', identifier='chicago_residential_parcel_sales_per_quarter_and_cpd_district'),
_ReferenceKey(schema='report', identifier='chicago_residential_parcel_sales_per_quarter_and_cp__dbt_backup')]

Name collisions can occur when the length of two models' names approach your database's builtin limit. Try restructuring your project such that no two models share the prefix 'chicago_residential_parcel_sales_per_quarter_and_cp'. Then, clean your warehouse of any removed models.

Done. PASS=22 WARN=0 ERROR=1 SKIP=0 TOTAL=23

Apparently postgres's default max identifier length is 63 bytes, and it seems dbt creates backup tables with the suffix __dbt_backup (and this behavior may be vestigial), which reduces the length limit for identifiers down to 51 characters, as dbt will just truncate identifiers longer than 51 characters. This is the root of my issue, as two of my models have names with the same first 51 characters, so truncating to 51 and tacking on a constant caused a naming collision. This issue was astutely anticipated by this comment.

In any case, I'll just make a note to limit table_names to 50 characters. Note to future me: if you had to look this up again, make an issue and take the time to programmatically enforce the name limit.