dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 100 forks source link

fix: Fixed table_type for GOVERNED tables #661

Closed svdimchenko closed 5 months ago

svdimchenko commented 5 months ago

Description

with following table

{'Name': 'test', 'DatabaseName': 'raw', 'Owner': 'hadoop', 'CreateTime': datetime.datetime(2022, 3, 8, 15, 49, 45, tzinfo=tzlocal()), 'UpdateTime': datetime.datetime(2022, 3, 8, 15, 49, 45, tzinfo=tzlocal()), 'LastAccessTime': datetime.datetime(1970, 1, 1, 1, 0, tzinfo=tzlocal()), 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'brand', 'Type': 'string'}, {'Name': 'dt', 'Type': 'timestamp'}], 'Location': 's3://stage-dwh-transformed/brand_governed', 'InputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe', 'Parameters': {'serialization.format': '1'}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {}, 'SkewedInfo': {'SkewedColumnNames': [], 'SkewedColumnValues': [], 'SkewedColumnValueLocationMaps': {}}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'GOVERNED', 'Parameters': {'transient_lastDdlTime': '1646750985', 'classification': 'parquet', 'table_type': 'LAKEFORMATION_GOVERNED'}, 'CreatedBy': 'xxx', 'IsRegisteredWithLakeFormation': False, 'CatalogId': 'xxx', 'IsMultiDialectView': False}

I've got an error on dbt docs generate:

Encountered an error while generating catalog: 'GOVERNED'

I've added GOVERNED table type and improved logging as well

Checklist

nicor88 commented 5 months ago

@svdimchenko have a look at the CI, there are some issues that needs to be addressed.

svdimchenko commented 5 months ago

@svdimchenko have a look at the CI, there are some issues that needs to be addressed.

I see there is problem with https://github.com/dbt-athena/dbt-athena/blob/49896c21361ea74bb28aa1ee184c19f01d746776/tests/unit/test_adapter.py#L417-L417

do we have a use case when we need to create table without type ?

nicor88 commented 5 months ago

As far as I remember there are specific scenarios (like external catalogs) where you can have tables without type, so yes, we should cover that scenarios.

svdimchenko commented 5 months ago

As far as I remember there are specific scenarios (like external catalogs) where you can have tables without type, so yes, we should cover that scenarios.

I just see that this may conflict with get_table_type method. Which table_type should we get for such case ?

nicor88 commented 5 months ago

I just see that this may conflict with get_table_type method. Which table_type should we get for such case ?

We can create UNKNOWN or something like that for case where the table type is None, alternatively we map to table even for None cases... Alternatively, we can also change the unit tests to assert the value error on those cases ;) - I honestly don't recall the edge case.

nicor88 commented 5 months ago

@svdimchenko final comment, https://github.com/dbt-athena/dbt-athena/pull/661#discussion_r1617360774 the rest looks good, thanks for the refactor.