Open dungdm93 opened 4 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Yeah @dungdm93 - this is a known problem, It's not a bug per-se, more of a naming problem but it's hard to solve in a backward-compatible way. Would you like to attempt to solve it this way that the previous usages will be not affected ? It might be temporary (with some deprecation notice) but we have to make sure that at least when people migrate to 2.0, their code still works and at most they receive a deprecation warning.
And it would be great to solve it BTW. it's pretty anoying.
Btw, this also affects some other protocols, so far I've identified HTTP/HTPS and SPARK:
Sample URI's that I have in my Secrets Manager :
http://https%3A%2F%2Fexample.com
spark://spark%3A%2F%2Fspark-master-0.spark-master.spark.svc.cluster.local:7077
I've also just added a piece of documentation that explains the strange URI's required. We should keep it until this current Issue taken care of... (but I recall from my conversation in the past that it was not easy to fix this).
The above doesn't work for me, so I came up with the following that works in my case for HTTP:
AIRFLOW_CONN_API_AUTH=http://app_auth:8000/http
So I added http
at the end and later use relative path in SimpleHttpOperator's endpoint argument.
I know it's not a bug per-se, but maybe a path forward would be to introduce new http and https hook and name it differently (RestHook
or httpApiHook
?) and then deprecate the current one, not breaking anyones code/pipelines behavior?
Why not. Maybe you would like to attempt to do it? PRs are welcome :)
I would like to revive this thread because recently I've spend a lot of time when I was trying to set up an HTTPS connection because of how confusing connection URI is. Currently if I want to do it then the URI has to be either http://example.com/https
OR http://https://example.com
but the most intuitive one: https://example.com
doesn't work.
I also noticed that there was more work done on supporting the second pattern: https://github.com/apache/airflow/pull/31465 which makes it seem like it is now the recommended solution rather than a workaround.
I feel like there is a design decision needed here. Either Airflow committers agree that the current URI syntax: https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#uri-format is the final one and in that case this issue should be closed.
Alternatively (which is in my opinion better) we could try to aim at changing the URI syntax so that only the 3rd pattern is valid and others are deprecated. E.g. schema (actually 'scheme') and protocol could be considered as one and the same thing in the connection model (in future, now it would be a breaking change). And connection type can be derived from the scheme in code (e.g. if the scheme is HTTPS then conn_type is HTTP, in almost all other cases conn_type = scheme). Then the path part is not needed and hostname should not contain protocol because it will be available in scheme.
I'm willing to propose some solution and deprecation path (so that the old patterns are only supported until Airflow 3) if the community sees value in this idea.
I do not think it is final. I think we would be entirely open to get more "intutitive" way of defining and storing the connection as long as it does not introduce backwards compatiblity - then it can even be made "preferred" and the old one "deprecated" - I have completley no problem with that.
Ok, after looking into this issue again I see that parts of my comment were mistaken - I realised that 'schema' in connection is not 'scheme' with a typo but a database schema. This is because these terms were mixed in HttpHook implementation. I think my PR (https://github.com/apache/airflow/pull/35712) should fix this issue. But encoding protocol in the hostname is a separate thing so I will leave it for now.
Apache Airflow version: 1.10.11
Environment:
uname -a
): 5.4.0-47-genericWhat happened: I store my connection into Airflow using this command
And when It's used by hooks, I realize that HTTP connection parse path as schema but use as protocol:
For now, I could workaround by store hold URI in host field like
But you could see, it's absolutely bad solution.
So my suggestion is the connection should be sanitize follow the standard URL syntax:
This mean that:
host
part should only contain domain name, hostname or IP addressschema
filed topath
for better naming. This filed store HTTP path or SQL database namescheme
orprotocol
filed for store acture connection protocol, e.g:conn_type=http
,protocol=https
conn_type=mysql
,protocol=jdbc:mysql