Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Upgrade dbt-core to 1.0.1 #52

Closed courentin closed 2 years ago

courentin commented 2 years ago

PR to fix #51

For beta testers, you can test it out by:

If you encounter any issue like this one: https://github.com/Tomme/dbt-athena/pull/52#issuecomment-1003027561, try to uninstall dbt and dbt-athena first and reinstall dbt-athena.

5 users haven been able to install and use this version successfully.

courentin commented 2 years ago

@Tomme This PR is ready for review (you can review or commit by commit), it can be tested by installing dbt-athena from my branch: pip install git+https://github.com/courentin/dbt-athena.git@upgrade-dbt-1.0.0 and by following the 1.0.0 migration guide.

BeantownData commented 2 years ago

Hi! I installed this and dbt test ran flawlessly. When I tried dbt run I got an error:

An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 2:74: mismatched input '"my_model_name__dbt_backup"' expecting {'SELECT', 'FROM', etc.

I looked at the target/run code and saw that it was creating a my_model_name__dbt_tmp table but there was no mention of a __dbt_backup table. This is a regular table model.

courentin commented 2 years ago

@BeantownData thank you for testing it!

Concerning the __dbt_backup table, it seems dbt is creating it from the current table, see https://github.com/dbt-labs/dbt-core/issues/3453#issuecomment-861525096

On my side I can run dbt run flawlessly, could you share what the generated code from my_model_name looks like?

[UPDATE] I have been able to reproduce it

courentin commented 2 years ago

I was able to run this version of dbt-athena successfully on a pretty big project!

@BeantownData could you try to install the last version with pip install --force-reinstall git+https://github.com/courentin/dbt-athena.git@618d8d8e829d6ad7323f72e85b23bc1ba3f32a4b and try again?

BeantownData commented 2 years ago

Sadly nope.

I created a new test model for this:

select 'foo' as bar

I get this output:

dbt run -m test_model 13:16:17 [WARNING]: Deprecated functionality The source-paths config has been renamed to model-paths. Please update your dbt_project.yml configuration to reflect this change. 13:16:17 [WARNING]: Deprecated functionality The data-paths config has been renamed to seed-paths. Please update your dbt_project.yml configuration to reflect this change. 13:16:17 Running with dbt=1.0.0 13:16:18 Found 219 models, 823 tests, 0 snapshots, 0 analyses, 357 macros, 0 operations, 0 seed files, 212 sources, 0 exposures, 0 metrics 13:16:18 13:16:45 Concurrency: 20 threads (target='jalbro') 13:16:45 13:16:45 1 of 1 START table model jalbro_stage.test_model................................ [RUN] Failed to execute query. Traceback (most recent call last): File "c:\python38\lib\site-packages\pyathena\common.py", line 307, in _execute query_id = retry_api_call( File "c:\python38\lib\site-packages\pyathena\util.py", line 84, in retry_api_call return retry(func, *args, *kwargs) File "c:\python38\lib\site-packages\tenacity__init.py", line 423, in call do = self.iter(retry_state=retry_state) File "c:\python38\lib\site-packages\tenacity__init__.py", line 360, in iter return fut.result() File "c:\python38\lib\concurrent\futures_base.py", line 432, in result return self.get_result() File "c:\python38\lib\concurrent\futures_base.py", line 388, in get_result raise self._exception File "c:\python38\lib\site-packages\tenacity\init.py", line 426, in call__ result = fn(args, **kwargs) File "c:\python38\lib\site-packages\botocore\client.py", line 276, in _api_call return self._make_api_call(operation_name, kwargs) File "c:\python38\lib\site-packages\botocore\client.py", line 586, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 2:56: mismatched input '"test_model"' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER} Failed to execute query.

The run version of the query is:

create table awsdatacatalog.jalbro_stage.test_model__dbt_tmp as ( select 'foo' as bar );

When I tried to run the command manually in the athena console I found that the test_model__dbt_tmp table had been created by dbt.

Hope this helps! Thanks!

BeantownData commented 2 years ago

Is there a way I can find the transaction wrapper code for this? I would have thought it would be part of the run code.

BeantownData commented 2 years ago

I'm wondering if this could possibly a permissions issue? I could verify if I had the code that was actually run to swap the temp table, backup table, and real table.

courentin commented 2 years ago

Actually this error comes from the fact that dbt macros were moved to some subdirectories. When installing this adapter, these macros were not copied to the dbt package properly, https://github.com/Tomme/dbt-athena/pull/52/commits/097cb655dfff4589850102ec656b464cb79597f6 fixes that.

I think the command I gave you to reinstall dbt-athena is not enough

Can you tell me if the result of:

ls -l $(pip show dbt-core | grep Location | sed 's/Location: //g')/dbt/include/athena/macros

shows two directories adapters and materializations and no yaml files?

If you have other things that these two folders, it probably means that you need to uninstall both dbt-core and dbt-athena before reinstalling it.

BeantownData commented 2 years ago

I couldn't run that snippet as I'm on a windows box.

But I uninstalled dbt-core and dbt-athena and it worked! 218 tables created and 823 tests passed. Thanks!

BeantownData commented 2 years ago

Do we need more testers for this? Should we post on the dbt slack athena channel to get more?

courentin commented 2 years ago

Hello @Tomme I hope you're doing well!

Would you have the bandwidth to review this PR in the near future? If no, can someone else do it on your behalf?

Thank you :)

gvillafanetapia commented 2 years ago

@courentin I tested this and worked great! no problems upgrading from a project previously using dbt 0.20 🎉🎉🎉🚀🚀🚀

I'm wondering about schema evolution BTW... is this supported? like if when I do a full-refresh instead of deleting the glue table and creating a new one it would create a new schema version 🤔

courentin commented 2 years ago

I might be wrong as I don't use this feature but it does not seem to be supported by the adapter see https://github.com/Tomme/dbt-athena/issues/47

Antauri commented 2 years ago

Great work guys! Can this become a release soon? The reason we're choosing dbt beyond its other features is the Athena plugin, on which has a central usage in our eco-system. Would be lovely to be aligned to latest dbt 1.x when possible.

Antauri commented 2 years ago

:partying_face: :+1: :100: yeah!