apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.31k stars 14.34k forks source link

Extra field widgets of custom connections do not properly save data #13597

Closed soltanianalytics closed 3 years ago

soltanianalytics commented 3 years ago

Apache Airflow version: 2.0.0

Environment: Docker image apache/airflow:2.0.0-python3.8 on Win10 with WSL

What happened:

I built a custom provider with a number of custom connections.

This works:

What does not work:

I used the JDBC example to string myself along by copying it and pasting it as a hook into my custom provider package. (I did not install the JDBC provider package, unless it is installed in the image I use - but if I don't add it in my own provider package, I don't have the connection type in the UI, so I assume it is not). Curiously, The JDBC hook works just fine. I then created the following file:

"""
You find two child classes of DbApiHook in here. One is the exact copy of the JDBC
provider hook, minus some irrelevant logic (I only care about the UI stuff here).
The other is the exact same thing, except I added an "x" behind every occurance
of "jdbc" in strings and names.
"""

from typing import Any, Dict, Optional
from airflow.hooks.dbapi import DbApiHook

class JdbcXHook(DbApiHook):
    """
    Copy of JdbcHook below. Added an "x" at various places, including the class name.
    """

    conn_name_attr = 'jdbcx_conn_id'  # added x
    default_conn_name = 'jdbcx_default' # added x
    conn_type = 'jdbcx' # added x
    hook_name = 'JDBCx Connection' # added x
    supports_autocommit = True

    @staticmethod
    def get_connection_form_widgets() -> Dict[str, Any]:
        """Returns connection widgets to add to connection form"""
        from flask_appbuilder.fieldwidgets import BS3TextFieldWidget
        from flask_babel import lazy_gettext
        from wtforms import StringField

        # added an x in the keys
        return {
            "extra__jdbcx__drv_path": StringField(lazy_gettext('Driver Path'), widget=BS3TextFieldWidget()),
            "extra__jdbcx__drv_clsname": StringField(
                lazy_gettext('Driver Class'), widget=BS3TextFieldWidget()
            ),
        }

    @staticmethod
    def get_ui_field_behaviour() -> Dict:
        """Returns custom field behaviour"""
        return {
            "hidden_fields": ['port', 'schema', 'extra'],
            "relabeling": {'host': 'Connection URL'},
        }

class JdbcHook(DbApiHook):
    """
    General hook for jdbc db access.
    JDBC URL, username and password will be taken from the predefined connection.
    Note that the whole JDBC URL must be specified in the "host" field in the DB.
    Raises an airflow error if the given connection id doesn't exist.
    """

    conn_name_attr = 'jdbc_conn_id'
    default_conn_name = 'jdbc_default'
    conn_type = 'jdbc'
    hook_name = 'JDBC Connection plain'
    supports_autocommit = True

    @staticmethod
    def get_connection_form_widgets() -> Dict[str, Any]:
        """Returns connection widgets to add to connection form"""
        from flask_appbuilder.fieldwidgets import BS3TextFieldWidget
        from flask_babel import lazy_gettext
        from wtforms import StringField

        return {
            "extra__jdbc__drv_path": StringField(lazy_gettext('Driver Path'), widget=BS3TextFieldWidget()),
            "extra__jdbc__drv_clsname": StringField(
                lazy_gettext('Driver Class'), widget=BS3TextFieldWidget()
            ),
        }

    @staticmethod
    def get_ui_field_behaviour() -> Dict:
        """Returns custom field behaviour"""
        return {
            "hidden_fields": ['port', 'schema', 'extra'],
            "relabeling": {'host': 'Connection URL'},
        }

What you expected to happen:

After doing the above, I expected

What actually happenes:

Screenshot 1: image

Screenshot 2: image

Screenshot 3: image

Screenshot 4: image

Screenshot 5: image

Screenshot 6 - airflow providers behaviours: image

Screenshot 7 - airflow providers get: image (Note: This error occurs with pre-installed providers as well)

Screenshot 8 - airflow providers hooks: image

Screenshot 9 - aorflow providers list: image

Screenshot 10 - airflow providers widgets: image

How to reproduce it:

potiuk commented 3 years ago

Interesting. The 'version' error has already been fixed in master for the upcoming 2.0.1 (merged 2 days ago).

https://github.com/apache/airflow/commit/f49f36b6a0c1332e141200d1a0c900991b9cdaef#diff-c56df299b30e60d690494057a4e8721d36406c0cca266961ff2ae6504993c8cbL159

I do not think it was causing the problem but maybe installing master version and trying out might be something you could try before I dig deeper @soltanianalytics ?

potiuk commented 3 years ago

Can you try one more thing ? I think I know what could be the problem. The problem is lilkely in the javascript which might not be ready for having two connections where name of one connection (jdbc) is substring of the other (jdbcx). Could you please change the name of the jdbcx to jdxbc everywhere ? That could help us to narrow-down the problem and then we can likely find a fix for that.

potiuk commented 3 years ago

(Note - the javascript was there in 1.10 but we simply did not have such 'overlapping' connection names :)

soltanianalytics commented 3 years ago

The naming change seems to fix the duplicate fields issue: image

However, this does not fix that the extra is an empty string for the "x" version: image

The extra is also not encrypted in the metadata db, if that information helps.

soltanianalytics commented 3 years ago

Can you point me at where the extra is saved to the metadata database? It seems to me that the issue may be arising there. Saving non-extra fields (login, password, ...) works just fine. It's only the extra fields that don't work because for some reason, it saves an empty string instead of the full extra. It seems that when it is working correctly, the extra saved a dict with values for all custom widgets, no matter what the connection is - and this fails for some reason for my custom connection types, except for the copied jdbc one, strangely enough.

potiuk commented 3 years ago

OK. Sorry I have not seen your answer - good that we know about the duplication. That should be easy fix :).

And I think I know where the problem is. There is one more place I missed in views.py:

    def process_form(self, form, is_created):
        """Process form data."""
        formdata = form.data
        if formdata['conn_type'] in ['jdbc', 'google_cloud_platform', 'grpc', 'yandexcloud', 'kubernetes']:
            extra = {key: formdata[key] for key in self.extra_fields if key in formdata}
            form.extra.data = json.dumps(extra)

Both are to be fixed in 2.0.1 then !

potiuk commented 3 years ago

Thanks for being so thorough!. You can test it by locally modifying the airflow/www/views.py and adding the connection_type there !

potiuk commented 3 years ago

BTW. I realized that the original implementation is somewhat wrong as it stores all the extras from all the connection in extra of any connection (even if it is empty). Not nice :).

potiuk commented 3 years ago

Hey @soltanianalytics - I implemented fixes for all the problems in #13640. I tested it for built-in providers and I have a reason to believe they should also work for custom ones.

Would it be possible that you apply those changes in your installation and see if they work es expected ?

I'd greatly appreciate it.