apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.22k stars 14.06k forks source link

AttributeError: 'ExaStatement' object has no attribute 'description' #28731

Closed mkalinna closed 1 year ago

mkalinna commented 1 year ago

Apache Airflow Provider(s)

exasol

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.3.1 apache-airflow-providers-exasol==4.1.1

Apache Airflow version

2.5.0

Operating System

Rocky Linux 8.7 (like RHEL 8.7)

Deployment

Other Docker-based deployment

Deployment details

What happened

After upgrading from Airflow 2.4.3 to 2.5.0, the ExasolOperator stopped working even when executing simple SQL Statements. See log snippet below for details.

It looks like the Exasol Hook fails due to a missing attribute. It seems likely the issue was introduced in a refactoring of the Exasol Hook to use common DBApiHook https://github.com/apache/airflow/pull/28009/commits

What you think should happen instead

No response

How to reproduce

Any execution of ExasolOperator in Airflow built with the mentioned constraints should show the issue.

Anything else


[2023-01-04, 15:31:33 CET] {exasol.py:176} INFO - Running statement: EXECUTE SCRIPT mohn_fw.update_select_to_date_for_area('CORE'), parameters: None
[2023-01-04, 15:31:33 CET] {taskinstance.py:1772} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/common/sql/operators/sql.py", line 255, in execute
    output = hook.run(
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/exasol/hooks/exasol.py", line 178, in run
    result = handler(cur)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/common/sql/hooks/sql.py", line 62, in fetch_all_handler
    if cursor.description is not None:
AttributeError: 'ExaStatement' object has no attribute 'description'
[2023-01-04, 15:31:33 CET] {taskinstance.py:1322} INFO - Marking task as UP_FOR_RETRY. dag_id=MOH_DWH_DAILY_CORE, task_id=update_select_to_date_for_area, execution_date=20221225T210000, start_date=20230104T143132, end_date=20230104T143133
[2023-01-04, 15:31:33 CET] {standard_task_runner.py:100} ERROR - Failed to execute job 46137 for task update_select_to_date_for_area ('ExaStatement' object has no attribute 'description'; 7245)

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template!

Taragolis commented 1 year ago

It seems likely the issue was introduced in a refactoring of the Exasol Hook to use common DBApiHook https://github.com/apache/airflow/pull/28009/commits

This one include in 4.1.2: https://airflow.apache.org/docs/apache-airflow-providers-exasol/stable/index.html#changelog

Taragolis commented 1 year ago

AttributeError: 'ExaStatement' object has no attribute 'description'

Something here went wrong. Because description property is a mandatory in DB-API 2 but seems like Exasol Hook do not use DB-API 2 wrapper when create connection and use native connection instead

potiuk commented 1 year ago

Thanks for reporitng @mkalinna BTW. For the moment, the easy workaround for now is to downgrade Exasol provider to previous working version (4.1.1)

potiuk commented 1 year ago

@Taragolis ->seems the DB-API-2 wrapper is not used because it does not support placeholders /parameters.

Taragolis commented 1 year ago

Yep, for some reason it does not propagate parameters to the method which supports parameters.

Maybe we do not need to use DbApiHook for databases which do not have fully compatible DB-API 2 libraries? 🤔🤔🤔

mkalinna commented 1 year ago

Thanks for jumping at this topic so quickly! It turned out the issue was introduced in apache-airflow-providers-exasol==4.1.0 already, but going back to 4.0.1 works for now.

potiuk commented 1 year ago

Yeah. I think we should make it back to use exasol's native select @Taragolis but I will see if we can still make it DBHook compliant. This has some profound effects for Open Lineage integration for one. CC: @kazanzhy @eladkal - you might also be interested

potiuk commented 1 year ago

Proposal for a fix in https://github.com/apache/airflow/pull/28744 . @mkalinna - would it be possible that you test that fix (install latest provider and apply the patch manually aftet it is installed?)

mkalinna commented 1 year ago

Hi @potiuk, I understand there are still some discussions going on how to fix the issue best. However, I applied the current patch to apache-airflow-providers==4.1.2 and come across a different issue now:

[2023-01-05, 17:17:26 CET] {exasol.py:180} INFO - Running statement: EXECUTE SCRIPT mohn_fw.update_select_to_date_for_area('CORE'), parameters: None
[2023-01-05, 17:17:26 CET] {taskinstance.py:1772} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/common/sql/operators/sql.py", line 255, in execute
    output = hook.run(
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/exasol/hooks/exasol.py", line 182, in run
    result = handler(cur)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/exasol/operators/exasol.py", line 26, in exasol_fetch_all_handler
    return cursor.fetchall()
  File "/usr/local/lib/python3.9/site-packages/pyexasol/statement.py", line 96, in fetchall
    return [row for row in self]
  File "/usr/local/lib/python3.9/site-packages/pyexasol/statement.py", line 96, in <listcomp>
    return [row for row in self]
  File "/usr/local/lib/python3.9/site-packages/pyexasol/statement.py", line 63, in __next__
    raise ExaRuntimeError(self.connection, 'Attempt to fetch from statement without result set')
pyexasol.exceptions.ExaRuntimeError: 
(
    message     =>  Attempt to fetch from statement without result set
    dsn         =>  xxx.bpghub.de:8563
    user        =>  xxx
    schema      =>  xxx
    session_id  =>  1754200364326350346
)

[2023-01-05, 17:17:26 CET] {taskinstance.py:1322} INFO - Marking task as UP_FOR_RETRY. dag_id=MOH_DWH_DAILY_CORE, task_id=update_select_to_date_for_area, execution_date=20221225T210000, start_date=20230105T161724, end_date=20230105T161726
[2023-01-05, 17:17:26 CET] {standard_task_runner.py:100} ERROR - Failed to execute job 46523 for task update_select_to_date_for_area (
(
    message     =>  Attempt to fetch from statement without result set
    dsn         =>  xxx.bpghub.de:8563
    user        =>  xxx
    schema      =>  xxx
    session_id  =>  1754200364326350346
)
; 1542)
potiuk commented 1 year ago

Ah yeah. Right. so this makes it exasol-specific handler @eladkal and we cannot really make it generic. Fix is coming.

potiuk commented 1 year ago

Right - I think the new version should work better @mkalinna :) https://github.com/apache/airflow/pull/28744

mkalinna commented 1 year ago

Now it fails in another line in the exasol hook when looking up the description. It think it would fail in line 188, too.

[2023-01-07, 14:47:43 CET] {exasol.py:180} INFO - Running statement: EXECUTE SCRIPT mohn_fw.update_select_to_date_for_area('CORE'), parameters: None
[2023-01-07, 14:47:43 CET] {taskinstance.py:1772} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/common/sql/operators/sql.py", line 260, in execute
    output = hook.run(
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/exasol/hooks/exasol.py", line 185, in run
    _last_description = cur.description
AttributeError: 'ExaStatement' object has no attribute 'description'
potiuk commented 1 year ago

Pushed another candidate. (Sorry I have no exasol DB to test it with :)) .

mkalinna commented 1 year ago

Thanks for taking care of the issue, I can confirm it works now:

[2023-01-09, 10:54:48 CET] {exasol.py:203} INFO - Running statement: EXECUTE SCRIPT mohn_fw.update_select_to_date_for_area('CORE'), parameters: None
[2023-01-09, 10:54:48 CET] {exasol.py:212} INFO - Rows affected: <bound method ExaStatement.rowcount of <ExaStatement session_id=1754538679605737261 stmt_idx=1>>
[2023-01-09, 10:54:48 CET] {taskinstance.py:1322} INFO - Marking task as SUCCESS. dag_id=MOH_DWH_DAILY_CORE, task_id=update_select_to_date_for_area, execution_date=20230101T210000, start_date=20230109T095447, end_date=20230109T095448
potiuk commented 1 year ago

Cool. Thanks For helping to verify it 👍