apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.03k stars 14.28k forks source link

OracleHook not able to set TCP keep-alive or re-use OS-level TCP keep-alive #32844

Open liangxibing opened 1 year ago

liangxibing commented 1 year ago

Apache Airflow version

2.6.3

What happened

We were using Airflow Oracle Provider 2.0.1 before and recently upgraded to 3.7.1 along with airflow upgrade, and found a strange issue around tcp keep alive settings since upgrading.

We had setup OS-level tcp keep alive settings long time ago to avoid firewall blocking issues (it will kill connection which is idle for more than 5 mins), and it was working well with old version of Oracle provider (using cx-Oracle).

However we met firewall issues after upgrading, which I think may be related to new version of Oracle provider using oracledb.

https://github.com/oracle/python-oracledb/blob/v1.3.2/src/oracledb/connection.py#L1033

            expire_time: int=0,

https://github.com/oracle/python-oracledb/blob/v1.3.2/src/oracledb/connection.py#L1144-L1146

    - expire_time: an integer indicating the number of minutes between the
      sending of keepalive probes. If this parameter is set to a value greater
      than zero it enables keepalive (default: 0)

Seems default setting will DISABLE tcp keepalive for oracledb, however I didn't find any place from Airflow to overwrite this setting, e.g. pass related settings to conn_config

https://github.com/apache/airflow/blob/providers-oracle/3.7.1/airflow/providers/oracle/hooks/oracle.py#L242

Did I miss anything?

What you think should happen instead

It should be able to turn on tcp keep alive related settings Airflow Oracle Provider.

How to reproduce

Just using below codes:

    oracle_hook = OracleHook(oracle_conn_id=self.conn_id)
    with oracle_hook.get_conn() as src_conn:
      cursor = src_conn.cursor()
      cursor.outputtypehandler = self.output_type_handler
      self.log.info("Querying data from source: %s", self.oracle_conn_id)
      cursor.arraysize = self.cursor_size
      self.log.info("Setting cursor size to {}".format(self.cursor_size))
      self.log.info('Executing {}'.format(self.sql))
      if self.parameters is not None:
        self.log.info(f"Querying data with the parameters:{self.parameters}")
        try:
          cursor.execute(self.sql, self.parameters)
        except Exception as e:
          self.log.error(f"Error message:{e}")
          end_time = datetime.now()
          duration = (end_time - start_time).total_seconds()
          self.log.info(
              f'OracleToS3Operator: Schema: {self.table_config["SCHEMA_CODE"]}, Table: {self.table_config["TABLE_NAME"]}, Duration: {duration} seconds, Status: Failed')
          raise AirflowException(f"The task has experienced a failure.")
        else:
          self.log.info(f"The query has been successfully executed")

Operating System

Debian GNU/Linux 11 (bullseye) which is used by airflow helm chart

Versions of Apache Airflow Providers

apache-airflow 2.6.3 apache-airflow-providers-amazon 8.2.0 apache-airflow-providers-celery 3.2.1 apache-airflow-providers-cncf-kubernetes 7.1.0 apache-airflow-providers-common-sql 1.5.2 apache-airflow-providers-docker 3.7.1 apache-airflow-providers-elasticsearch 4.5.1 apache-airflow-providers-ftp 3.4.2 apache-airflow-providers-google 10.2.0 apache-airflow-providers-grpc 3.2.1 apache-airflow-providers-hashicorp 3.4.1 apache-airflow-providers-http 4.4.2 apache-airflow-providers-imap 3.2.2 apache-airflow-providers-microsoft-azure 6.1.2 apache-airflow-providers-mysql 5.1.1 apache-airflow-providers-odbc 4.0.0 apache-airflow-providers-oracle 3.7.1 apache-airflow-providers-postgres 5.5.1 apache-airflow-providers-redis 3.2.1 apache-airflow-providers-salesforce 5.4.1 apache-airflow-providers-sendgrid 3.2.1 apache-airflow-providers-sftp 4.3.1 apache-airflow-providers-slack 7.3.1 apache-airflow-providers-snowflake 4.2.0 apache-airflow-providers-sqlite 3.4.2 apache-airflow-providers-ssh 3.7.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

Using official airflow helm chart 1.9.0

Anything else

After upgrading, we can see a lot of airflow tasks using new oracle providers are stuck and actually those queries not seen from source Oracle DB, i.e. connections killed by firewall.

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk commented 1 year ago

Yep. Looks like it could be added

When python-oracledb uses [Oracle Client libraries 19c](https://python-oracledb.readthedocs.io/en/latest/user_guide/introduction.html#thickarchfig), then the [Easy Connect Plus syntax](https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#easyconnect) syntax enables some options to be used without needing a sqlnet.ora file. For example, if your firewall times out every 4 minutes, and you cannot alter the firewall settings, then you may decide to use EXPIRE_TIME in your connect string to send a probe every 2 minutes to the database to keep connections ‘alive’:  

I will assign you.

KsM00000 commented 8 months ago

Hi! Has there been any progress on the problem?

potiuk commented 8 months ago

@KsM00000 - maybe you would like to take over? I can assign you. If you are interested in an issue like that, fixing it is way more efffective than asking if there is a progress (when clearly there is not).