dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
294 stars 119 forks source link

dbt-athena-community support #203

Open brabster opened 1 year ago

brabster commented 1 year ago

Description & motivation

resolves #274

PR based on https://github.com/dbt-labs/dbt-external-tables/pull/133

Checklist

brabster commented 1 year ago

Local run for info:

$ ./run_test.sh athena
Setting up virtual environment
Changing working directory: integration_tests
Starting integration tests
15:59:49  Running with dbt=1.4.6
15:59:50  Installing ../
15:59:50    Installed from <local @ ../>
15:59:50  Installing dbt-labs/dbt_utils
15:59:50    Installed from version 0.8.0
15:59:50    Updated version available: 1.1.0
15:59:50  
15:59:50  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps
15:59:52  Running with dbt=1.4.6
15:59:52  Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
15:59:52  
15:59:54  Concurrency: 1 threads (target='athena')
15:59:54  
15:59:54  1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
16:00:04  1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 10.79s]
16:00:04  
16:00:04  Finished running 1 seed in 0 hours 0 minutes and 12.15 seconds (12.15s).
16:00:04  
16:00:04  Completed successfully
16:00:04  
16:00:04  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
16:00:07  Running with dbt=1.4.6
16:00:07  No prep necessary, skipping
16:00:09  Running with dbt=1.4.6
16:00:09  Unable to do partial parsing because config vars, config profile, or config target have changed
16:00:11  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:00:12  1 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:13  1 of 4 (1) OK -1
16:00:13  1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:15  1 of 4 (2) OK -1
16:00:15  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:00:17  2 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:18  2 of 4 (1) OK -1
16:00:18  2 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:20  2 of 4 (2) OK -1
16:00:20  2 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:00:23  2 of 4 (3) OK -1
16:00:23  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:00:26  3 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:00:27  3 of 4 (1) OK -1
16:00:27  3 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:00:29  3 of 4 (2) OK -1
16:00:29  3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:01:06  3 of 4 (3) OK -1
16:01:06  3 of 4 (4) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:01:42  3 of 4 (4) OK -1
16:01:42  3 of 4 (5) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:02:16  3 of 4 (5) OK -1
16:02:16  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:02:17  4 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...  
16:02:19  4 of 4 (1) OK -1
16:02:19  4 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:02:20  4 of 4 (2) OK -1
16:02:20  4 of 4 (3) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...  
16:02:25  4 of 4 (3) OK -1
16:02:27  Running with dbt=1.4.6
16:02:27  Unable to do partial parsing because config vars, config profile, or config target have changed
16:02:29  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:02:30  1 of 4 SKIP
16:02:30  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:02:33  2 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:02:36  2 of 4 (1) OK -1
16:02:36  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:02:39  3 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:03:15  3 of 4 (1) OK -1
16:03:15  3 of 4 (2) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:03:51  3 of 4 (2) OK -1
16:03:51  3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
16:04:25  3 of 4 (3) OK -1
16:04:25  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:04:26  4 of 4 (1) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...  
16:04:31  4 of 4 (1) OK -1
16:04:33  Running with dbt=1.4.6
16:04:33  Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
16:04:33  
16:04:34  Concurrency: 1 threads (target='athena')
16:04:34  
16:04:34  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:04:38  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 3.84s]
16:04:38  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:04:41  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.77s]
16:04:41  
16:04:41  Finished running 2 tests in 0 hours 0 minutes and 7.58 seconds (7.58s).
16:04:41  
16:04:41  Completed successfully
16:04:41  
16:04:41  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
brabster commented 1 year ago

Re: https://github.com/dbt-labs/dbt-external-tables/pull/133#discussion_r811471521 (need for quote_comment: key to get around invalid comment chars: this fix doesn't seem to work, at least in Athena engine v3. The whole query gets commented out

16:09:31  1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...  
16:10:52  Encountered an error while running operation: Runtime Error
  Runtime Error
    [ErrorCategory:USER_ERROR, ErrorCode:DDL_FAILED], Detail:FAILED: ParseException line 1:489 cannot recognize input near '<EOF>' '<EOF>' '<EOF>'
brabster commented 1 year ago

Note - https://github.com/dbt-athena/dbt-athena/pull/161 effectively added a large subset of external tables functionality in dbt-athena itself. Might be worth trying to refactor that and utilise it to cut down on the duplicated logic in here

aidan-o-boyle-kroo commented 11 months ago

@brabster what's needed to get this PR approved? I'm happy to contribute.

brabster commented 11 months ago

@aidan-o-boyle-kroo hi there! I've just pulled this, it is still working on dbt-athena-community 1.4.6 and works against latest 1.6.1 too.

$ ATHENA_TEST_DBNAME=AwsDataCatalog AWS_REGION=eu-west-2 ATHENA_TEST_BUCKET=my-redacted_bucket ATHENA_TEST_WORKGROUP=primary ./run_test.sh athena
Setting up virtual environment for dbt-athena
Changing working directory: integration_tests
Starting integration tests
19:25:28  Running with dbt=1.6.3
19:25:29  Installing ../
19:25:29  Installed from <local @ ../>
19:25:29  Installing dbt-labs/dbt_utils
19:25:29  Installed from version 0.8.0
19:25:29  Updated version available: 1.1.1
19:25:29  
19:25:29  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps
19:25:32  Running with dbt=1.6.3
19:25:32  Registered adapter: athena=1.6.1
19:25:32  Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:34  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:34  
19:25:37  Concurrency: 1 threads (target='athena')
19:25:37  
19:25:37  1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
19:25:48  1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 11.03s]
19:25:48  
19:25:48  Finished running 1 seed in 0 hours 0 minutes and 14.18 seconds (14.18s).
19:25:48  
19:25:48  Completed successfully
19:25:48  
19:25:48  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
19:25:51  Running with dbt=1.6.3
19:25:51  Registered adapter: athena=1.6.1
19:25:51  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:51  No prep necessary, skipping
19:25:54  Running with dbt=1.6.3
19:25:54  Registered adapter: athena=1.6.1
19:25:54  Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:57  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:57  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:25:58  1 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:25:59  1 of 4 (1) OK -1
19:25:59  1 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:01  1 of 4 (2) OK -1
19:26:01  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:26:02  2 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:26:03  2 of 4 (1) OK -1
19:26:03  2 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:05  2 of 4 (2) OK -1
19:26:05  2 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:26:08  2 of 4 (3) OK -1
19:26:08  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:26:10  3 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:26:11  3 of 4 (1) OK -1
19:26:11  3 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:26:12  3 of 4 (2) OK -1
19:26:12  3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:26:49  3 of 4 (3) OK -1
19:26:49  3 of 4 (4) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:27:25  3 of 4 (4) OK -1
19:27:25  3 of 4 (5) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:27:59  3 of 4 (5) OK -1
19:27:59  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:27:59  4 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...  
19:28:00  4 of 4 (1) OK -1
19:28:00  4 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...  
19:28:01  4 of 4 (2) OK -1
19:28:01  4 of 4 (3) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...  
19:28:04  4 of 4 (3) OK -1
19:28:07  Running with dbt=1.6.3
19:28:07  Registered adapter: athena=1.6.1
19:28:07  Unable to do partial parsing because config vars, config profile, or config target have changed
19:28:09  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:28:09  1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:28:10  1 of 4 SKIP
19:28:10  2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:28:12  2 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:28:16  2 of 4 (1) OK -1
19:28:16  3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:28:17  3 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:28:53  3 of 4 (1) OK -1
19:28:53  3 of 4 (2) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:29:29  3 of 4 (2) OK -1
19:29:29  3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...  
19:30:03  3 of 4 (3) OK -1
19:30:03  4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:30:04  4 of 4 (1) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...  
19:30:06  4 of 4 (1) OK -1
19:30:09  Running with dbt=1.6.3
19:30:09  Registered adapter: athena=1.6.1
19:30:09  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:30:09  
19:30:10  Concurrency: 1 threads (target='athena')
19:30:10  
19:30:11  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
19:30:14  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 3.83s]
19:30:14  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
19:30:17  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.77s]
19:30:17  
19:30:17  Finished running 2 tests in 0 hours 0 minutes and 7.89 seconds (7.89s).
19:30:17  
19:30:17  Completed successfully
19:30:17  
19:30:17  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

I've love to get it merged, will remove draft label. Main concerns would be:

I am depending on my fork for multiple projects now - you can kick tyres and check it's working for you that way I guess.

aidan-o-boyle-kroo commented 11 months ago

could we mock athena ? https://github.com/getmoto/moto/blob/master/IMPLEMENTATION_COVERAGE.md#athena

brabster commented 11 months ago

We could - I'm not sure how effective a test that would be, and I'm not sure what the maintainers need in order to merge the PR. @jeremyyeo can you advise on what we'd need to do to get this PR merged in? :bowing_man:

github-actions[bot] commented 5 months ago

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

Avinash-1394 commented 4 months ago

I'd also like to contribute whatever it takes to get this merged. This would be really helpful for our team.

nicor88 commented 4 months ago

@dataders who should we add as reviewer to merge this one? 🙏🏻 Quite some folks from the community mentioned dbt-external-tables in few occasions.

brabster commented 4 months ago

I've just set it up again with latest dbt-athena-community against my personal AWS account. All appears to still be working fine, integration tests run and pass. I've added an example of minimal IAM permissions and defaulted a config value to assist with any future test automation setup. Also checked that the implementation does its own drop-if logic and so doesn't appear to inherit any inappropriate housekeeping behaviour from the adapter.

(venv) @brabster ➜ /workspaces/dbt-external-tables/integration_tests (dbt-athena-community-support) $ dbt test --target athena
16:59:28  Running with dbt=1.7.13
16:59:29  Registered adapter: athena=1.7.2
16:59:29  Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 683 macros, 0 groups, 0 semantic models
16:59:29  
16:59:30  Concurrency: 1 threads (target='athena')
16:59:30  
16:59:30  1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:59:32  1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.59s]
16:59:32  2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [RUN]
16:59:35  2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_  [PASS in 2.46s]
16:59:35  
16:59:35  Finished running 2 tests in 0 hours 0 minutes and 5.65 seconds (5.65s).
16:59:35  
16:59:35  Completed successfully
16:59:35  
16:59:35  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2