dbt-msft / dbt-sqlserver

dbt adapter for SQL Server and Azure SQL
MIT License
212 stars 99 forks source link

Incorporate 1.7.2 changes #461

Closed cody-scott closed 9 months ago

cody-scott commented 10 months ago

Updates adapter to 1.7.2

cody-scott commented 10 months ago

One piece here about two tests related to the full refresh of seeds.

It doesn’t seem to want to drop the table to pass that materializations test.

Not sure if tests want to be revisited to see if there is any tests that are covered in the parent adapter. Kept the existing tests here and they passed (calling the fabric macros where not otherwise defined).

ericmuijsvanoord commented 10 months ago

@cody-scott : Thanks for the hard work! What is pending the merge of the code? Reviews? Fixing the full refresh of seeds? Additional tests?

cody-scott commented 9 months ago

I believe we are waiting on the repo maintainers to take a look.

Further work is needed to add the full suite of tests that DBT provides, and to see if that full-refresh failure can be corrected (or ignored) for now.

ericmuijsvanoord commented 9 months ago

@cody-scott : can you tag the maintainers as reviewers or is that not possible ?

schlich commented 9 months ago

Hey @cody-scott !!! Can't believe i missed this PR. I believe I have maintainer status on the repo -- I volunteered a couple months ago to help this along. really happy to see an attempt here -- i've been hacking away at my own fixes but it seems like we can start from yours!!

Leaving comments in a "review" note below.

also, my dev container crashed on startup so i might be suggesting some changes there. nvm, didnt realize you were using docker-in-docker so it was just codespaces not working

Thanks for the help!

schlich commented 9 months ago

one more thing... on my machine i'm getting failures for the 3 test_docs tests in addition to the two i believe you mentined for test_seed... dont have time to dig in tonight but i'm attaching test report logs:

test_results.json

JCZuurmond commented 9 months ago

@sdebruyn and @dataders : Could you have a look? We are waiting for this update

schlich commented 9 months ago

@sdebruyn and @dataders : Could you have a look? We are waiting for this update

@JCZuurmond sdebruyn is no longer maintaining this repo and I believe dataders is on parental leave until march... I believe I am your point of contact for now :)

cody-scott commented 9 months ago

Ok, i'm seeing 5 failed @schlich from your json file.

In terms of some of the failing azure tests, I think most of that should pull from fabric, but not positive here (might need to look closer). We aren't using azure so i don't have a reliable way to test on my side.

Is there an appetite to add some of the other tests that dbt has introduced since 1.4? I believe there is a number more in their suite now.

Next steps i'm seeing:

  1. Sort out why the full refresh seeds aren't working
  2. See if there is a link to the docs and full-refresh issue; if not look at that
  3. Sort out the azure auth component for those using service principals/entra
cody-scott commented 9 months ago

Two questions here:

  1. The behaviour of the default seed full refresh test seems to be expecting the table view to not exist. This is not the case as its dropping + creating right away in the log file for that step. The default implementation then won't pass.
  2. In the fabric adapter they have over-riden most (or all, i haven't checked the full set) of the tests to have it expecting the table to exist https://github.com/microsoft/dbt-fabric/blob/main/tests/functional/adapter/test_seed.py#L599

The fabric adapter seems to have re-written

many of the tests and not implemented the default test suite from dbt; same problem they're having?

I am puzzled here; it seems like we should be expecting the table to exist (like what the fabric one does) The default implementation doesn't seem to have changed, so unless something has changed on the seed/full refresh side then the way forward would be updating those tests that are failing to pass, as the full refresh is recreating the tables.

@schlich what are you thinking here?

second step of test dbt.log

Log file for only the second step (run_dbt seed full refresh) where its pretty clear its dropping + creating.

Edit:

See my latest comment. Default behaviour is drop cascade in postgres + others. Fabric doesn't drop cascade, only drop, hence the fail.

cody-scott commented 9 months ago

One approach here would be to re-align with the same test set of fabric or try and get the tests in line with the current dbt suite + any one-off modifications as required.

cody-scott commented 9 months ago

Ok, i think i've got the issue; the downstream view isn't being dropped as part of the full refresh. Not sure exactly what changed but that seems to be the issue here. Maybe its something fabric is doing that isn't aligning...

cody-scott commented 9 months ago

Alright I think this is the state.

The base tests are expecting that it performs like postgres and should perform a 'drop view/table cascade', which would explain why the test is expected to have no view on a full refresh.

Since sql server doesn't support this I see two options

  1. Implement a similar cascade drop by determining the downstream views with this sys.sql_expression_dependencies. This would be the closest representation to the expected DBT behaviour. 2. Ignore this and pass the tests with it expecting the tables, don't drop the views until that step is run in the build.

Upside of the first is its closest to the expected DBT, downside is it's going to drop all downstream tables and views. If you have anything that relies on them in build then it's going to remove them until it hits that step. relevant discussion.

Other piece is non dbt controlled views will also get dropped.

Fabric is going for option 2.

@schlich What's your thoughts? It's tied to full refresh so that's probably why those particular items are failing (I think).

schlich commented 9 months ago

Amazing legwork! Based on that bug report you linked, which seems like its a completely valid concern, I'd say option 2 is the way to go. Let me know if you need help with the code; I'm totally swamped of late so i appreciate you taking this up.

Where are we at on the CLI login you think?

@cody-scott

prescode commented 9 months ago

I also would like to volunteer to help if I can. Give me a task and I'll make a run at it. Thanks so much for your work on this @cody-scott !

prescode commented 9 months ago

I am making an effort at getting the functional tests running. It may take me some time to get up to speed on the code.

schlich commented 9 months ago

@prescode FWIW with the whole test ecosystem in the state its in i genuinely don't think its worth your mental effort to wrap your mind around the architecture. All you really need to know is that we opt-in to tests by subclassing them in a tests file and then the good lord knows what happens during test prep and execution while god knows what is being tested. Its a really peculiar structure that i have a really long forum post halfway typed up about. The whole suite needs to be burned down and implemented with proper pytest fixtures. (all the love to open source maintainers everywhere but i'd really like to know what inspired this design... ANYWAYS...)

The Make instructions should be fairly simple i think. having just boostrapped my own env i just had to make a virtual environment, pip install . pip install -r dev-requirements.txt, copy the example .env to a live copy, and then make server and make test. Let me know if that helps.

prescode commented 9 months ago

Test environment is up and I'm able to run the test suite. I've made some progress but I need some sleep. I'll try finish up the tests tomorrow.

prescode commented 9 months ago

Ok, i'm seeing 5 failed @schlich from your json file.

  • TestBasicSeedTestsSQLServer.test_simple_seed_full_refresh_flag
  • TestBasicSeedTestsSQLServer.test_simple_seed_full_refresh_flag
  • TestDocsGenReferencesSQLServer.test_references
  • TestDocsGenerateSQLServer.test_run_and_generate
  • TestDocsGenerateSQLServer.test_run_and_generate_no_compile

In terms of some of the failing azure tests, I think most of that should pull from fabric, but not positive here (might need to look closer). We aren't using azure so i don't have a reliable way to test on my side.

Is there an appetite to add some of the other tests that dbt has introduced since 1.4? I believe there is a number more in their suite now.

Next steps i'm seeing:

  1. Sort out why the full refresh seeds aren't working
  2. See if there is a link to the docs and full-refresh issue; if not look at that
  3. Sort out the azure auth component for those using service principals/entra

The doc errors

  • TestDocsGenerateSQLServer.test_run_and_generate
  • TestDocsGenerateSQLServer.test_run_and_generate_no_compile

are related to the time_type data type in the test fixture:

    @pytest.fixture(scope="class")
    def expected_catalog(self, project):
        return base_expected_catalog(
            project,
            role=os.getenv("DBT_TEST_USER_1"),
            id_type="int",
            text_type="varchar",
            time_type="datetime",
            view_type="VIEW",
            table_type="BASE TABLE",
            model_stats=no_stats(),
        )

The fixture in dbt-fabric uses datetime2. Updating to datetime2 results in a pass

    @pytest.fixture(scope="class")
    def expected_catalog(self, project):
        return base_expected_catalog(
            project,
            role=os.getenv("DBT_TEST_USER_1"),
            id_type="int",
            text_type="varchar",
            time_type="datetime2",
            view_type="VIEW",
            table_type="BASE TABLE",
            model_stats=no_stats(),
        )

I'm fairly certain this is the correct fix but being unfamiliar with the code I'm looking for confirmation. Any insight into this update?

prescode commented 9 months ago

I had hoped to get further along but I'm struggling to figure out this testing scheme and trace the tests. I've overridden some of the test fixtures but I'm still getting failures. I'll not have much time in the next couple of days to continue but I'll pick it up again on the weekend if needed.

schlich commented 9 months ago

The doc errors

  • TestDocsGenerateSQLServer.test_run_and_generate
  • TestDocsGenerateSQLServer.test_run_and_generate_no_compile

are related to the time_type data type in the test fixture:

    @pytest.fixture(scope="class")
    def expected_catalog(self, project):
        return base_expected_catalog(
            project,
            role=os.getenv("DBT_TEST_USER_1"),
            id_type="int",
            text_type="varchar",
            time_type="datetime",
            view_type="VIEW",
            table_type="BASE TABLE",
            model_stats=no_stats(),
        )

The fixture in dbt-fabric uses datetime2. Updating to datetime2 results in a pass

    @pytest.fixture(scope="class")
    def expected_catalog(self, project):
        return base_expected_catalog(
            project,
            role=os.getenv("DBT_TEST_USER_1"),
            id_type="int",
            text_type="varchar",
            time_type="datetime2",
            view_type="VIEW",
            table_type="BASE TABLE",
            model_stats=no_stats(),
        )

I'm fairly certain this is the correct fix but being unfamiliar with the code I'm looking for confirmation. Any insight into this update?

I've run into the same error in other contexts, I can confirm this fix.

Can you push up a PR of your WIP against this branch so i can examine and potentially merge?

prescode commented 9 months ago

@schlich it looks like you beat me to it. We don't need to merge my changes anymore, correct? I also made a change to the devcontainer.json file to update the path to the test.env file based on our above conversation. Let me know if there is anything else I can do to help.

schlich commented 9 months ago

@mikaelene SOS if you are around. We need an organization owner to remove the dependency on Python 3.7 in the required checks. Will see if anyone has a hotline to @dataders otherwise, or we will have to wait until he returns from parental leave and explore an unofficial pypi release in the meantime. Unless anyone else is seeing anything i'm not!

But we're almost there!!!!

b-per commented 9 months ago

Hi @schlich , I might be able to help while Anders is out. Can you tell me where you see the dependency on Python 3.7 in the required checks?

schlich commented 9 months ago

Hi @b-per it was a busy day yesterday! I was able to obtain admin perms to the repo to remove the 3.7 check myself. Today we are stuck with the pyPI release. Do you have access there?

b-per commented 9 months ago

No, I don't have access to this one. Looking at the git blame the release action was originally set by @sdebruyn and there a few maintainers on PyPi but I am not in the list

sdebruyn commented 9 months ago

I updated the token in the GH secrets, I don't have access to add other folks myself.