apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.85k stars 14.25k forks source link

Status of testing Providers that were prepared on July 15, 2021 #17037

Closed potiuk closed 3 years ago

potiuk commented 3 years ago

have a kind request for all the contributors to the latest provider packages release. Could you help us to test the RC versions of the providers and let us know in the comment, if the issue is addressed there.

Providers that need testing

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 2.1.0rc1

New Providers

raphaelauv commented 3 years ago

for the docker provider

apache-airflow-providers-docker==2.1.0rc1

still failing for docker in docker

I tried with airflow 2.1.2

t3 = DockerOperator(
    api_version='auto',
    docker_url="unix://var/run/docker.sock",
    command='/bin/sleep 30',
    image='centos:latest',
    network_mode='bridge',
    task_id='docker_op_tester',
    dag=dag,
)

give

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 268, in _raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.41/containers/create

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 253, in _run_image
    return self._run_image_with_mounts(self.mounts + [tmp_mount], add_tmp_variable=True)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 272, in _run_image_with_mounts
    self.container = self.cli.create_container(
  File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 430, in create_container
    return self.create_container_from_config(config, name)
  File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 441, in create_container_from_config
    return self._result(res, True)
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 274, in _result
    self._raise_for_status(response)
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 270, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.9/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error for http+docker://localhost/v1.41/containers/create: Bad Request ("invalid mount config for type "bind": bind source path does not exist: /tmp/airflowtmp3d8fxnte")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 346, in execute
    return self._run_image()
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 255, in _run_image
    if self.host_tmp_dir in str(e):
TypeError: 'in <string>' requires string as left operand, not NoneType

the line

if self.host_tmp_dir in str(e):

is buggy

a fix -> https://github.com/apache/airflow/pull/17061

raphaelauv commented 3 years ago

for the cncf.kubernetes provider

apache-airflow-providers-cncf-kubernetes==2.0.1rc1

still failing with the an arg containing ".json"

I tried with airflow 2.1.2

    k = KubernetesPodOperator(
        namespace=namespace,
        image="hello-world",
        labels={"foo": "bar"},
        arguments=["vivi.json"],
        name="airflow-test-pod",
        task_id="task-one",
        ...
        )

give

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1294, in _prepare_and_execute_task_with_callbacks
    self.render_templates(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1793, in render_templates
    self.task.render_template_fields(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 992, in render_template_fields
    self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1005, in _do_render_template_fields
    rendered_content = self.render_template(content, context, jinja_env, seen_oids)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in render_template
    return [self.render_template(element, context, jinja_env) for element in content]
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in <listcomp>
    return [self.render_template(element, context, jinja_env) for element in content]
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1040, in render_template
    return jinja_env.get_template(content).render(**context)
  File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 883, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 857, in _load_template
    template = self.loader.load(self, name, globals)
  File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 115, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 197, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: vivi.json
potiuk commented 3 years ago

if self.host_tmp_dir in str(e):

Indeed - I remove docker provider from this round and prepare a next RC. My mistake :)

potiuk commented 3 years ago

jinja2.exceptions.TemplateNotFound: vivi.json

@raphaelauv Are you sure you had vivi.json file locally ? For me that error looks like it simply was not available locally?

raphaelauv commented 3 years ago

There is no file vivi.json in local, it's just an argument for the pod

the issue is https://github.com/apache/airflow/issues/16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod

it's still the case even with this MR https://github.com/apache/airflow/pull/16930

potiuk commented 3 years ago

the issue is #16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod

it's still the case even with this MR #16930

Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json could appear somewhat randomly (for example the argument could be "aaaajson". When a templated field value ends with one of the "template_extensions", then all operators (including KPO) will try to read the file with this extension, process it by the template engine and fail when it does not exist (which is precisely what we see). Do you think it is a problem ? And how do you think it should work (and keep backwards compatibility)?

raphaelauv commented 3 years ago

With ".json", this is expected behaviour

then it could need a new option to let the user disable the templating on the field arguments or anything else.

But the issue should not be considered done since the OP asked for ".json"

potiuk commented 3 years ago

Sometimes the description is not precise enough. Actually what you propose is a new feature not a bug - would you like to propose PR for that?

raphaelauv commented 3 years ago

I propose to re-open the issue https://github.com/apache/airflow/issues/16922 since it's still happening.

raphaelauv commented 3 years ago

otherwise

    k = KubernetesPodOperator(
        namespace=namespace,
        image="hello-world",
        labels={"foo": "bar"},
        arguments=["vivijson"],
        ...
        )

do not throw anymore an error with apache-airflow-providers-cncf-kubernetes==2.0.1rc1

potiuk commented 3 years ago

I will let @kaxil and @eladkal tell here what they think. In here we are very focused on getting community opinions on issues (committers have binding votes). In the meantime, I strongly encourage you @raphaelauv to open a PR with proposed new feature. Due to backwards compatibility this anyhow needs to be discussed and opinion of at least a few people has to be taken into account if we are going to implement such a change, as it impacts ~70 providers and ~600 operators, so it is highly unlikely to be resolved (and issue reopened) with an issue related to one operator.

Our community works on reaching consensus and discussing such things, their consequences, backwards compatibilities etc.

raphaelauv commented 3 years ago

I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.

potiuk commented 3 years ago

I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.

i started discussion there. Airflow is a community managed project, and it is quite common when you "want" something, you are asked to contribute that, especially if it is not a big fix. A lot of people here who use airflow and comment here, also contribute back fixes and features so this is very common that we are asking people to help with what they think is a good feature. If you make a statement that something "should" be done, I think it is only natural that you "want" this to happen and in open-source world, the natural consequence might be that you are asked to contribute it.

raphaelauv commented 3 years ago

Again , I don'k ask for anything , just proposing to let know the guy of the issue.

I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.

potiuk commented 3 years ago

I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.

This is cool that you want to help. I did not want to discourage it, I just politely asked if you would like to contribute it (which you do not have to). And look what happened - I started the discussion in the original issue https://github.com/apache/airflow/issues/16922 and also asked the author what he thinks :). So I think we are pretty aligned here. My question to you was just the normal thing that happens in open-source - we are asking people who raise concerns and expres opinions, to help. This is how our community works :).

Not sure why the frowning faces here, I was just acting with a very good intent here. It's ok to have different opinions and discuss, and it's also OK to ask people who raise concerns, if they are ok to follow up and contribute, I think.

raphaelauv commented 3 years ago

frowning faces

it's the emoji : confused

cause I was not understanding why you explain me all this common sense about OSS , let me remove them.

potiuk commented 3 years ago

Really - what we are here up to is to get help from everyone who participates, and we are gently and politely (I hope) asking people to participate. If my question was understood differently - I think this is more of the medium :). Not everyone understands it - apologies if it created confusion

jnturton commented 3 years ago

Hi, in a virtualenv in which I installed RC1, the apache.drill: 1.0.0rc1 provider does work. I do have two notes of things I didn't expect.

  1. There was no predefined drill_default connection and I had to define it myself, yet I did add a drill_default connection to airflow/utils/db.py.
  2. I noticed that my Drill password was included in the clear in the logging output from base.py, example shown below.
[2021-07-18 09:15:05,050] {base.py:69} INFO - Using connection to: id: drill_default. Host: localhost, Port: 8047, Schema: , Login: james, Password: **should-be-secret**, extra: {'driver_dialect': 'drill+sadrill', 'storage_plugin': 'dfs'}
potiuk commented 3 years ago

Thanks for testing @dzamo! And thanks for being so thorough! Really appreciated!

  1. There was no predefined drill_default connection and I had to define it myself, yet I did add a drill_default connection to airflow/utils/db.py.

Yep. Totally expected. Default connections is one thing that is not included in "provider packages". The default connections are part of "airflow" package as they are really only useful during testing and airflow development (so in main branch) and possibly for quick airflow testing with docker-compose etc. The default drill connection is already present in main (you can see it with breeze start-airflow --load-default-connections for example). It will be automatically included in Airflow when we release 2.2.

  1. I noticed that my Drill password was included in the clear in the logging output from base.py, example shown below.

Yeah. This is a known issue that will be addressed in 2.1.3. https://github.com/apache/airflow/pull/16579. It's internal webserver logs only (not task logs visible via Airflow UI) so while important to fix, it is not critical.

potiuk commented 3 years ago

Closed by mistake :).

@pmalafosse @baolsen @hewe @scottypate @codenamestif @ciancolo @kaxil @Daniel-Han-Yang @namjals @malthe @BKronenbitter @oyarushe @LukeHong @saurasingh @freget @ashb @ciancolo @samgans - it would be great if at least some of those changes are tested :). Appreciate your help with that.

oyarushe commented 3 years ago

Tested the MySQL operator and found the issue, could you please review fix #17080.

potiuk commented 3 years ago

Tested the MySQL operator and found the issue, could you please review fix #17080.

Thanks. I will mark it also for rc2 and release together with Docker Provider separately.

pavelhlushchanka commented 3 years ago

16820 looks good

potiuk commented 3 years ago

Some more tests would be nice :)

oyarushe commented 3 years ago

Regarding the Amazon provider, looks like not all template_fields_renderers were fixed. Please, check my PR #17087

potiuk commented 3 years ago

Sounds like we might want to release all the packages together with rc2 :D

kaxil commented 3 years ago

Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json could appear somewhat randomly (for example the argument could be "aaaajson".

Correct, the issue was with json randomly used at end of string, not with .json the extension. (For KubernetesPodOperator)

hewe commented 3 years ago

Verified #16767 and it is working as expected 😃

ciancolo commented 3 years ago

there is a problem with #16365. In the case of Extra parameter in format {'verify': true} the conversion str to bool is not necessary.

With Extra parameters as string, the PR looks good.

Check #17125 for the bug fix.

[2021-07-21, 07:15:33 UTC] {taskinstance.py:1455} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/airflow/airflow/models/taskinstance.py", line 1182, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/opt/airflow/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/opt/airflow/airflow/models/taskinstance.py", line 1315, in _execute_task
    result = task_copy.execute(context=context)
  File "/opt/airflow/airflow/providers/tableau/operators/tableau_refresh_workbook.py", line 70, in execute
    with TableauHook(self.site_id, self.tableau_conn_id) as tableau_hook:
  File "/opt/airflow/airflow/providers/tableau/hooks/tableau.py", line 70, in __init__
    verify = bool(strtobool(verify))
  File "/usr/local/lib/python3.8/distutils/util.py", line 313, in strtobool
    val = val.lower()
AttributeError: 'bool' object has no attribute 'lower'
ciancolo commented 3 years ago

Tested #16350 on my local Sqoop env and it works as expected

potiuk commented 3 years ago

There were quite a bit too many problems with this wave, I am going to release a second rc2 wave with all the providers with fixes soon (plus few more changes). Closing this one for now.

potiuk commented 3 years ago

Check #17125 for the bug fix.

Cool. Thanks!

pmalafosse commented 3 years ago

There is a bug with https://github.com/apache/airflow/pull/16685 I will create a fix by tomorrow

potiuk commented 3 years ago

Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!

pmalafosse commented 3 years ago

Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!

Thanks! The fix is in that PR https://github.com/apache/airflow/pull/17141