Type string macro handling all cases

mweso-softserve commented 4 months ago

Creates a macro that redefine type_string for models in dbt_project_evaluator package. No need to override it for all models in a project anymore when using package. This attempt uses api.Column.string_type(600) for all databases except bigquery.

This is a:

[x] bug fix PR with no breaking changes
[ ] new functionality

Link to Issue

Closes #469

Description & motivation

This implementation replaces type_string() macro which does't work for redshift for tables defined in the way it was implemented in dbt-project-evaluastor with api.Column.string_type(600)

Checklist

[ ] I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
- [ ] BigQuery
- [ ] Postgres
- [x] Redshift
- [ ] Snowflake
- [ ] Databricks
- [ ] DuckDB
- [ ] Trino/Starburst
[ ] I have updated the README.md (if applicable)
[ ] I have added tests & descriptions to my models (and macros if applicable)

I tested that on a local version of this package without dispatcher configuration.

b-per commented 3 months ago

I had a chat with the team and we don't really want to introduce a dbt_project_evaluator version of type_string().

We understand that with https://github.com/dbt-labs/dbt-project-evaluator/issues/469, the first time CI when the package is added it might pick up more models than needed, but this should happen only once and could be handled manually.

I am going on leave for a couple of weeks and will want to revisit the few different issues we have around strings with Redshift but this particular PR is likely not one we would want to merge to this repo.

mweso-softserve commented 3 months ago

@b-per

I had a chat with the team and we don't really want to introduce a dbt_project_evaluator version of type_string().

We understand that with #469, the first time CI when the package is added it might pick up more models than needed, but this should happen only once and could be handled manually.

I am going on leave for a couple of weeks and will want to revisit the few different issues we have around strings with Redshift but this particular PR is likely not one we would want to merge to this repo.

dbt_project_evaluator already introduced its version of type_string() however the way it did changes the type definition for all redshift models. I'm not saying this PR is definitely the way to go, but the benefit is it redefines type_string() for dbt_project_evaluator's models only not the entire project. Let's imagine I want to use two different dbt packages that handle this kind of problem in two different ways. Both would try to redefine type_string() for the entire project and depending on the dispatch config one of them would always win potentially braking functionality of the other.

It's not only the issue of dealing with models being marked as changed the first time it runs, but redefining ultimately all existing models whenever the string type was used. I don't agree that any package should modify definitions of types in exiting models simply by adding its configuration, unless that's the sole purpose of the package. Please revisit the approach.

glsdown commented 3 months ago

I have to agree with @mweso-softserve here. In your current approach you have overwritten a core macro type_string used across every model in every Redshift dbt project. This instead will namespace that usage so it only becomes applicable to project evaluator models.

b-per commented 3 months ago

Thanks for the feedback. I was out of office for a bit but I will try to talk again with the team by next week.

b-per commented 2 months ago

We had a chat internally today.

As both of you raise the same point, we are going forward with this approach, creating a dbt_project_evaluator.type_string() macro.

Hopefully it should help people loading strings in Redshift which we a few issues were raised about.

dbt-labs / dbt-project-evaluator