apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.31k stars 14.09k forks source link

JDBC url parsing exception #33442

Open rkontrimas opened 1 year ago

rkontrimas commented 1 year ago

Apache Airflow version

2.6.3

What happened

JDBC test connection fails with message: _The connid t8E1J8DC isn't defined

Sample JDBC URL: jdbc:postgres://10.0.0.1 This issue with any JDBC URL with colon in it, which is a valid character.

What you think should happen instead

URL should be parsed correctly, and the test connection should be executed.

Issue seems to be in code connection module (lib/python3.9/site-packages/airflow/models/connection.py) lines 198-201

uri_splits = rest_of_the_url.split("://", 1)
            if "@" in uri_splits[0] or ":" in uri_splits[0]:
                raise AirflowException(f"Invalid connection string: {uri}.")

How to reproduce

Create new connection with a JDBC URL that has ":" (colon).

Operating System

Centos 7

Versions of Apache Airflow Providers

https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.9.txt

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

pankajkoti commented 1 year ago

@hussein-awala it seems to be related to our change in https://github.com/apache/airflow/pull/31465. Could you please check this? Thanks.

hussein-awala commented 1 year ago

@hussein-awala it seems to be related to our change in https://github.com/apache/airflow/pull/31465. Could you please check this?

Probably yes, I'll take a look

uranusjr commented 1 year ago

This issue with any JDBC URL with colon in it, which is a valid character.

According to various standards and proposals, the scheme should not contain a colon (among many other characters). For example, RFC 3986 says:

scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

i.e. ASCII alphanumerics, plus, minus, and dot symbols. I would be more happy if we consider this out of scope unless a stronger reason is provided.

potiuk commented 1 year ago

This issue with any JDBC URL with colon in it, which is a valid character.

According to various standards and proposals, the scheme should not contain a colon (among many other characters). For example, RFC 3986 says:

I thought the same initially @uranusjr but I did a bit more research and looking for similar problems raised. Actually what I found is that RFC 3986 says something different - look at my comment https://github.com/apache/airflow/pull/33531#pullrequestreview-1585719212

In this case this is a perfectly valid URI according to RFC 3986 but scheme/path are derived this way:

scheme = jdbc
path = postgres://10.0.0.1

All the other parts of the URI (authority, ~query, fragment~) are missing in such case. This ASCII visualisation from the RFC explains it very nicely:

UPDATE: query and fragments are both ok in such URL, the visual explanation does not show it but if you read the text actually you can specify query and fragments in such URI that has no :// separator and urlsplit correctly parses them.

  The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

The only confusing (but otherwise perfectly valid according to RFC) thing is that the "path" contains "://" which mitght be mistaken for separator after the scheme, but it is - in fact - part of the path. We are iterating on a solution in #33531.

dstandish commented 10 months ago

It appears this is a conflation of JDBC URI vs Airflow Connection URI

Is it not valid for airflow to require that its connections be encoded with <conn type>://<the rest of the info>?

But when you try to use the one URI for two purposes, then you run into trouble.

There should be a way in in JDBC conn to encode the desired scheme such as with jdbc://<blah>?scheme=<properly encode desired scheme>.

dstandish commented 10 months ago

For this problem, what are the actual connection attrs we are trying to encode?

For example, this code produces no errors:

from airflow.models.connection import Connection
c = Connection(conn_type="jdbc", host="postgres://abc123", schema="my_schema", login="my_login", password="my_pass", port=1234)
uri = c.get_uri()
c2 = Connection(uri=uri)
assert c.conn_type == c2.conn_type
assert c.host == c2.host
assert c.login == c2.login
assert c.password == c2.password
assert c.port == c2.port