dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.63k stars 1.59k forks source link

[CT-3245] [Feature] expected_references_catalog assumes column indexing starts at 1 #8879

Open benc-db opened 10 months ago

benc-db commented 10 months ago

Is this a new bug in dbt-core?

Current Behavior

I'm working on the 1.7.0 release for the Databricks adapter and I'm failing tests because expected_references_catalog assumes column indexing starts at 1, but in Databricks it starts at 0. In base_expected_catalog, in contrast, index is set to AnyInteger(). I will end up overriding all of this method (which is fairly long), just to accept 0-based indexing.

Expected Behavior

I can inherit the testing behavior without having the override this method to account for indexing differences.

Steps To Reproduce

Try to run an inheritor of BaseDocsGenReferences against a warehouse that uses 0-based indexing and watch it fail (or just look at the code and see that the indices are hard-coded).

Relevant log output

No response

Environment

- dbt: 1.7.0rc1

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

No response

dbeatty10 commented 10 months ago

Thanks for raising this issue that you ran into during the 1.7.0 release @benc-db!

I will end up overriding all of this method (which is fairly long), just to accept 0-based indexing.

Yep, this makes sense, especially since we'd be unlikely to make a change prior to 1.7.0 being released.

Technically, there's nothing erroneous about the implementation of the test cases here. Rather, it's an ergonomic issue, so I'm going to re-label this as a feature request to rather than a bug.

Going forward, here's some ideas of how we could support adapters that have 0-based column indexes without needing so many changes.

Easiest approach

Probably the easiest approach within dbt-core would be to add a parameter to expected_references_catalog named something like initial_index and have it default to initial_index=1.

Then within dbt-databricks, you'd be able override expected_catalog like this:

   @pytest.fixture(scope="class")
    def expected_catalog(self, project):
        return expected_references_catalog(
            project,
            role=AnyString(),
            id_type="bigint",
            text_type="string",
            time_type="timestamp",
            bigint_type="bigint",
            view_type="view",
            table_type="table",
            model_stats=_StatsLikeDict(),
            initial_index=0,
        )

Alternative approach

Alternatively, we could instead define it as a fixture within dbt-core, like:

    @pytest.fixture(scope="class")
    def initial_index(self):
        return 1

Most controversial approach

dbt-postgres has a macro named postgres__get_catalog_relations. Let's pretend for a moment that it used 0-based indexing rather than 1-based indexing. Then I could convert it to be 1-based by updating this line like this instead:

        col.attnum + 1 as column_index,

Most comprehensive approach

Lastly, we could define an adapter-wide interface in which the initial index could be defined and accessed.

Putting it all together

In all of the scenarios above (except for the "controversial" one), the code here would be re-written to be something like:

        "first_name": {
            "name": "first_name",
            "index": initial_index+1,
            "type": text_type,
            "comment": None,
        },

    summary_columns = {
        "first_name": {
            "name": "first_name",
            "index": initial_index + 0,
            "type": text_type,
            "comment": None,
        },
        "ct": {
            "name": "ct",
            "index": initial_index + 1,
            "type": bigint_type,
            "comment": None,
        },
    }

Summary

In the short-term, I'd advocate for the "easiest approach" above (or the fixture-based alternative).

benc-db commented 10 months ago

Agreed with either non-controversial approach. I started by doing the controversial approach honestly, then was uncertain about if there were other places that were going to break as a result :P.