Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Compression algorithm '' is not supported for file format 'parquet' #93

Closed julianste closed 2 years ago

julianste commented 2 years ago

We're getting an InvalidRequestException when specifying format 'parquet' without specifying the compression algorithm, in which case the default compression algorithms should be used (gzip in case of parquet)

Exception:

botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: Compression algorithm '' is not supported for file format 'parquet'.

Seems that in the create_table_as.sql no default for write_compression is defined, which results in an empty string '' (instead of just not specifying write_compression at all which would result in the default (gzip) being used.) Should be fixed by this PR

Stacktrace:

Failed to execute query.
Traceback (most recent call last):
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/pyathena/common.py", line 307, in _execute
    query_id = retry_api_call(
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/pyathena/util.py", line 84, in retry_api_call
    return retry(func, *args, **kwargs)
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/tenacity/__init__.py", line 409, in call
    do = self.iter(retry_state=retry_state)
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/tenacity/__init__.py", line 356, in iter
    return fut.result()
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/tenacity/__init__.py", line 412, in call
    result = fn(*args, **kwargs)
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/juliansteger/.local/share/virtualenvs/dbt-lib-Tz_GiOtG/lib/python3.8/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: Compression algorithm '' is not supported for file format 'parquet'.
SOVALINUX commented 2 years ago

Yes, it is a bug Fixed in PR https://github.com/Tomme/dbt-athena/pull/94 And I believe that default write_compression = none won't work, since this connector worked perfectly until today - most likely it is a change on Athena side for parquet files

julianste commented 2 years ago

@SOVALINUX yes I also think there was a change on Athena side, since it only failed as of last night. Regarding default write_compression I think it should still be none, in this case the Athena default is applied. If you hardcode to GZIP then the athena default for ORC, which is ZLIB, would be overwritten. It would probably even fail, since GZIP is not supported for ORC.

The bug was caused by a difference in jinja between none and not defined.

So I think PR #92 should be sufficient.

Tomme commented 2 years ago

Thank you @julianste for the fix, closing now.