Closed potiuk closed 3 years ago
for the docker provider
apache-airflow-providers-docker==2.1.0rc1
still failing for docker in docker
I tried with airflow 2.1.2
t3 = DockerOperator(
api_version='auto',
docker_url="unix://var/run/docker.sock",
command='/bin/sleep 30',
image='centos:latest',
network_mode='bridge',
task_id='docker_op_tester',
dag=dag,
)
give
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.41/containers/create
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 253, in _run_image
return self._run_image_with_mounts(self.mounts + [tmp_mount], add_tmp_variable=True)
File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 272, in _run_image_with_mounts
self.container = self.cli.create_container(
File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 430, in create_container
return self.create_container_from_config(config, name)
File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 441, in create_container_from_config
return self._result(res, True)
File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 274, in _result
self._raise_for_status(response)
File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.9/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error for http+docker://localhost/v1.41/containers/create: Bad Request ("invalid mount config for type "bind": bind source path does not exist: /tmp/airflowtmp3d8fxnte")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 346, in execute
return self._run_image()
File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 255, in _run_image
if self.host_tmp_dir in str(e):
TypeError: 'in <string>' requires string as left operand, not NoneType
the line
if self.host_tmp_dir in str(e):
is buggy
for the cncf.kubernetes provider
apache-airflow-providers-cncf-kubernetes==2.0.1rc1
still failing with the an arg containing ".json"
I tried with airflow 2.1.2
k = KubernetesPodOperator(
namespace=namespace,
image="hello-world",
labels={"foo": "bar"},
arguments=["vivi.json"],
name="airflow-test-pod",
task_id="task-one",
...
)
give
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1294, in _prepare_and_execute_task_with_callbacks
self.render_templates(context=context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1793, in render_templates
self.task.render_template_fields(context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 992, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1005, in _do_render_template_fields
rendered_content = self.render_template(content, context, jinja_env, seen_oids)
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in render_template
return [self.render_template(element, context, jinja_env) for element in content]
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in <listcomp>
return [self.render_template(element, context, jinja_env) for element in content]
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1040, in render_template
return jinja_env.get_template(content).render(**context)
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 883, in get_template
return self._load_template(name, self.make_globals(globals))
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 857, in _load_template
template = self.loader.load(self, name, globals)
File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 115, in load
source, filename, uptodate = self.get_source(environment, name)
File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 197, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: vivi.json
if self.host_tmp_dir in str(e):
Indeed - I remove docker provider from this round and prepare a next RC. My mistake :)
jinja2.exceptions.TemplateNotFound: vivi.json
@raphaelauv Are you sure you had vivi.json
file locally ? For me that error looks like it simply was not available locally?
There is no file vivi.json in local, it's just an argument for the pod
the issue is https://github.com/apache/airflow/issues/16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod
it's still the case even with this MR https://github.com/apache/airflow/pull/16930
the issue is #16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod
it's still the case even with this MR #16930
Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json
could appear somewhat randomly (for example the argument could be "aaaajson". When a templated field value ends with one of the "template_extensions", then all operators (including KPO) will try to read the file with this extension, process it by the template engine and fail when it does not exist (which is precisely what we see). Do you think it is a problem ? And how do you think it should work (and keep backwards compatibility)?
With ".json", this is expected behaviour
then it could need a new option to let the user disable the templating on the field arguments
or anything else.
But the issue should not be considered done since the OP asked for ".json"
Sometimes the description is not precise enough. Actually what you propose is a new feature not a bug - would you like to propose PR for that?
I propose to re-open the issue https://github.com/apache/airflow/issues/16922 since it's still happening.
otherwise
k = KubernetesPodOperator(
namespace=namespace,
image="hello-world",
labels={"foo": "bar"},
arguments=["vivijson"],
...
)
do not throw anymore an error with apache-airflow-providers-cncf-kubernetes==2.0.1rc1
I will let @kaxil and @eladkal tell here what they think. In here we are very focused on getting community opinions on issues (committers have binding votes). In the meantime, I strongly encourage you @raphaelauv to open a PR with proposed new feature. Due to backwards compatibility this anyhow needs to be discussed and opinion of at least a few people has to be taken into account if we are going to implement such a change, as it impacts ~70 providers and ~600 operators, so it is highly unlikely to be resolved (and issue reopened) with an issue related to one operator.
Our community works on reaching consensus and discussing such things, their consequences, backwards compatibilities etc.
I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.
I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.
i started discussion there. Airflow is a community managed project, and it is quite common when you "want" something, you are asked to contribute that, especially if it is not a big fix. A lot of people here who use airflow and comment here, also contribute back fixes and features so this is very common that we are asking people to help with what they think is a good feature. If you make a statement that something "should" be done, I think it is only natural that you "want" this to happen and in open-source world, the natural consequence might be that you are asked to contribute it.
Again , I don'k ask for anything , just proposing to let know the guy of the issue.
I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.
I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.
This is cool that you want to help. I did not want to discourage it, I just politely asked if you would like to contribute it (which you do not have to). And look what happened - I started the discussion in the original issue https://github.com/apache/airflow/issues/16922 and also asked the author what he thinks :). So I think we are pretty aligned here. My question to you was just the normal thing that happens in open-source - we are asking people who raise concerns and expres opinions, to help. This is how our community works :).
Not sure why the frowning faces here, I was just acting with a very good intent here. It's ok to have different opinions and discuss, and it's also OK to ask people who raise concerns, if they are ok to follow up and contribute, I think.
frowning faces
it's the emoji : confused
cause I was not understanding why you explain me all this common sense about OSS , let me remove them.
Really - what we are here up to is to get help from everyone who participates, and we are gently and politely (I hope) asking people to participate. If my question was understood differently - I think this is more of the medium :). Not everyone understands it - apologies if it created confusion
Hi, in a virtualenv in which I installed RC1, the apache.drill: 1.0.0rc1 provider does work. I do have two notes of things I didn't expect.
drill_default
connection and I had to define it myself, yet I did add a drill_default
connection to airflow/utils/db.py
.base.py
, example shown below.[2021-07-18 09:15:05,050] {base.py:69} INFO - Using connection to: id: drill_default. Host: localhost, Port: 8047, Schema: , Login: james, Password: **should-be-secret**, extra: {'driver_dialect': 'drill+sadrill', 'storage_plugin': 'dfs'}
Thanks for testing @dzamo! And thanks for being so thorough! Really appreciated!
- There was no predefined
drill_default
connection and I had to define it myself, yet I did add adrill_default
connection toairflow/utils/db.py
.
Yep. Totally expected. Default connections is one thing that is not included in "provider packages". The default connections are part of "airflow" package as they are really only useful during testing and airflow development (so in main
branch) and possibly for quick airflow testing with docker-compose etc. The default drill connection is already present in main
(you can see it with breeze start-airflow --load-default-connections
for example). It will be automatically included in Airflow when we release 2.2.
- I noticed that my Drill password was included in the clear in the logging output from
base.py
, example shown below.
Yeah. This is a known issue that will be addressed in 2.1.3. https://github.com/apache/airflow/pull/16579. It's internal webserver logs only (not task logs visible via Airflow UI) so while important to fix, it is not critical.
Closed by mistake :).
@pmalafosse @baolsen @hewe @scottypate @codenamestif @ciancolo @kaxil @Daniel-Han-Yang @namjals @malthe @BKronenbitter @oyarushe @LukeHong @saurasingh @freget @ashb @ciancolo @samgans - it would be great if at least some of those changes are tested :). Appreciate your help with that.
Tested the MySQL operator and found the issue, could you please review fix #17080.
Tested the MySQL operator and found the issue, could you please review fix #17080.
Thanks. I will mark it also for rc2 and release together with Docker Provider separately.
Some more tests would be nice :)
Regarding the Amazon provider, looks like not all template_fields_renderers
were fixed.
Please, check my PR #17087
Sounds like we might want to release all the packages together with rc2 :D
Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json could appear somewhat randomly (for example the argument could be "aaaajson".
Correct, the issue was with json
randomly used at end of string, not with .json
the extension. (For KubernetesPodOperator)
Verified #16767 and it is working as expected 😃
there is a problem with #16365. In the case of Extra
parameter in format {'verify': true}
the conversion str to bool is not necessary.
With Extra
parameters as string, the PR looks good.
Check #17125 for the bug fix.
[2021-07-21, 07:15:33 UTC] {taskinstance.py:1455} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/airflow/airflow/models/taskinstance.py", line 1182, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/opt/airflow/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/opt/airflow/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/opt/airflow/airflow/providers/tableau/operators/tableau_refresh_workbook.py", line 70, in execute
with TableauHook(self.site_id, self.tableau_conn_id) as tableau_hook:
File "/opt/airflow/airflow/providers/tableau/hooks/tableau.py", line 70, in __init__
verify = bool(strtobool(verify))
File "/usr/local/lib/python3.8/distutils/util.py", line 313, in strtobool
val = val.lower()
AttributeError: 'bool' object has no attribute 'lower'
Tested #16350 on my local Sqoop env and it works as expected
There were quite a bit too many problems with this wave, I am going to release a second rc2 wave with all the providers with fixes soon (plus few more changes). Closing this one for now.
Check #17125 for the bug fix.
Cool. Thanks!
There is a bug with https://github.com/apache/airflow/pull/16685 I will create a fix by tomorrow
Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!
Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!
Thanks! The fix is in that PR https://github.com/apache/airflow/pull/17141
have a kind request for all the contributors to the latest provider packages release. Could you help us to test the RC versions of the providers and let us know in the comment, if the issue is addressed there.
Providers that need testing
Those are providers that require testing as there were some substantial changes introduced:
Provider amazon: 2.1.0rc1
Provider apache.hive: 2.0.1rc1
Provider apache.sqoop: 2.0.1rc1
Provider cncf.kubernetes: 2.0.1rc1
json
string in template_field fails with K8s Operators (#16930): @kaxil~Provider docker: 2.1.0rc1~
~- [ ] [Adds option to disable mounting temporary folder in DockerOperator (#16932)(https://github.com/apache/airflow/pull/16932): @potiuk: bug found.~
Provider google: 4.1.0rc1
Provider jenkins: 2.0.1rc1
Provider microsoft.azure: 3.1.0rc1
~Provider mysql: 2.1.0rc1: Marking for RC2 release~
~- [ ] Added template_fields_renderers for MySQL Operator (#16914): @oyarushe~ ~- [ ] Extended template_fields_renderers for MySQL provider (#16987):~ @oyarushe
Provider postgres: 2.1.0rc1
Provider sftp: 2.1.0rc1
Provider snowflake: 2.1.0rc1
Provider ssh: 2.1.0rc1
Provider tableau: 2.1.0rc1
New Providers