getindata / dbt-flink-adapter

Adapter for dbt that executes dbt pipelines on Apache Flink
Apache License 2.0
83 stars 10 forks source link

Create Flink SQL gateway mock for component tests #34

Open gliter opened 1 year ago

gliter commented 1 year ago

We should have mock for SQL Gateway (ether mock client or provide web server mock) with utility methods so we can create component tests.

It would be worth considering to have functionality that would allow us to simply switch from mock to real instance so maybe with some simple tags we could have mixed component tests with mock and e2e test with real instance.

zqWu commented 1 year ago

i'd like to do some work on this issue. and this is what i thought:

plan 1. find some python package, which runs as flink standalone & sql-gw. i haven't found yet

plan 2. use a httpretty as a mock sql-gw, plus a sqlite(in memory) mock sql operation it is functional limited, as it cann't mock stream operation it also has a limit: table operation should not contain catalog and database, eg. select * from tbl1 ok, select * from xx_cat.xx_db.tbl1 will fail, since sqlite not support

initial design is:

     httpretty-----------------------------------------sqlite3
mock sql-gateway                                 process table sql process
process schema/database sql

catalog & database sql like: show catalogs / show current catalog / use catalog xx_catalog show databases / show current database / use xx_database

other sql on tables will be pass to sqlite and execute

gliter commented 1 year ago

For this task I would go for something much simpler. Just use standard mock in place of a Flink SQL client (alternativly http mock) so we can do some tests like:

  1. write dbt model
  2. run test
  3. assert that request that was send to mocked client was what we expect.

I would not use any sql db in place for Flink as DBs works vastly different than it. For full E2E I would use testcontainers and run actual Flink instance.

So in the end we should have three layers of tests:

  1. unit to test the logic of single methods and clases
  2. component test with mocked Flink client so we can test that our macros and code convert dbt model in what we expect
  3. e2e tests with actual Flink in test containers
zqWu commented 1 year ago

that's sensible.

zqWu commented 1 year ago

is something like this

def test_tmp(self):
    session = MockFlinkSqlGatewayClient.create_session(
        host="127.0.0.1",
        port=8083,
        session_name="some_session",
    )
    cursor = FlinkCursor(session)
    sql = "select * /** fetch_max(10) fetch_mode('streaming') fetch_timeout_ms(5000) */ from input2"
    cursor.execute(sql)
    # check sql received 
    stats = MockFlinkSqlGatewayClient.all_statements(session)
    self.assertTrue("SET 'execution.runtime-mode' = 'batch'", stats[0])
    self.assertTrue(sql, stats[1])
gliter commented 1 year ago

Yes, I just wonder about naming and function calls. Because create_session is something that adapter does internally. How would you use it in context of tests/functional/adapter/test_seeds.py where we are creating a dbt model and execute run_dbt function? How would you extract this session handler from adapter?