apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.16k stars 14.04k forks source link

Google provider : Unnecessary imports for CloudSQL operators #40175

Closed bourliam closed 1 month ago

bourliam commented 2 months ago

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

google, ssh, slack

Apache Airflow version

2.6.3

Operating System

composer

Deployment

Google Cloud Composer

Deployment details

No response

What happened

When trying to load the dag for a test with a CloudSQLExportInstanceOperator (from airflow.providers.google.cloud.operators.cloud_sql) I have a module not found error (ModuleNotFoundError: No module named 'airflow.providers.mysql').

I do not use mysql, so I should not have to install this module.

What you think should happen instead

We should not need those dependencies if not necessary. Moreover it is only used for type checking in another operator (CloudSQLExecuteQueryOperator)

How to reproduce

Import a CloudSQL operator without the mysql provider installed for example

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 2 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

shahar1 commented 2 months ago

Thanks for creating the issue, but it seems to be solved in a later version by #33783. As a Cloud Composer user, you might consider upgrading to composer-2.X.X-airflow-2.7.3 to handle this.

bourliam commented 2 months ago

Hello @shahar1, Thanks for the reply but this was not solved (at least not for me) :

I have been digging a little bit more and the module not found error comes from the file google/cloud/hooks/cloud_sql.py where the mysql import is "mandatory" even if I do not use MySql databases.

shahar1 commented 2 months ago

airflow.providers.google.cloud.operators.cloud_sql

Oh, I think that I see it now - I assume that you're referring to these imports in hooks/cloud_sql.py:

from airflow.providers.mysql.hooks.mysql import MySqlHook
from airflow.providers.postgres.hooks.postgres import PostgresHook

They are currently being used in this operator for initializing the hook, not only for typing:

    def get_database_hook(self, connection: Connection) -> PostgresHook | MySqlHook:
        """Retrieve database hook.

        This is the actual Postgres or MySQL database hook that uses proxy or
        connects directly to the Google Cloud SQL database.
        """
        if self.database_type == "postgres":
            db_hook: PostgresHook | MySqlHook = PostgresHook(connection=connection, schema=self.database) # <-
        else:
            db_hook = MySqlHook(connection=connection, schema=self.database) # <-
        self.db_hook = db_hook
        return db_hook

Maybe you could import them internally, so you won't need them both installed to use this hook. Reopening, feel free to contribute a PR - I'd be happy to review it.

jhongy1994 commented 1 month ago

Hello. I am interested in working on this issue. Can I take it over?

shahar1 commented 1 month ago

Hello. I am interested in working on this issue. Can I take it over?

I assigned you :)