astronomer / astro-cli

CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer
https://www.astronomer.io
Other
341 stars 70 forks source link

Unable to run DockerOperator #1615

Closed cmeans closed 3 months ago

cmeans commented 3 months ago

Describe the bug

DockerOperator is unable to reach the local "unix" socket.

*** Found local files:
***   * /usr/local/airflow/logs/dag_id=docker_sample/run_id=scheduled__2024-04-07T19:00:00+00:00/task_id=docker_op_tester/attempt=1.log
[2024-04-07, 19:10:01 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:00:00+00:00 [queued]>
[2024-04-07, 19:10:01 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:00:00+00:00 [queued]>
[2024-04-07, 19:10:01 UTC] {taskinstance.py:2193} INFO - Starting attempt 1 of 3
[2024-04-07, 19:10:01 UTC] {taskinstance.py:2217} INFO - Executing <Task(DockerOperator): docker_op_tester> on 2024-04-07 19:00:00+00:00
[2024-04-07, 19:10:01 UTC] {standard_task_runner.py:60} INFO - Started process 250 to run task
[2024-04-07, 19:10:01 UTC] {standard_task_runner.py:87} INFO - Running: ['airflow', 'tasks', 'run', 'docker_sample', 'docker_op_tester', 'scheduled__2024-04-07T19:00:00+00:00', '--job-id', '252', '--raw', '--subdir', 'DAGS_FOLDER/docker_operator_dag.py', '--cfg-path', '/tmp/tmpy1758vqm']
[2024-04-07, 19:10:01 UTC] {standard_task_runner.py:88} INFO - Job 252: Subtask docker_op_tester
[2024-04-07, 19:10:01 UTC] {task_command.py:423} INFO - Running <TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:00:00+00:00 [running]> on host ae794fccc156
[2024-04-07, 19:10:02 UTC] {taskinstance.py:2513} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='docker_sample' AIRFLOW_CTX_TASK_ID='docker_op_tester' AIRFLOW_CTX_EXECUTION_DATE='2024-04-07T19:00:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-04-07T19:00:00+00:00'
[2024-04-07, 19:10:02 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 497, in _make_request
    conn.request(
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 395, in request
    self.endheaders()
  File "/usr/local/lib/python3.11/http/client.py", line 1293, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1052, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 990, in send
    self.connect()
  File "/usr/local/lib/python3.11/site-packages/docker/transport/unixconn.py", line 27, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 497, in _make_request
    conn.request(
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 395, in request
    self.endheaders()
  File "/usr/local/lib/python3.11/http/client.py", line 1293, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1052, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 990, in send
    self.connect()
  File "/usr/local/lib/python3.11/site-packages/docker/transport/unixconn.py", line 27, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 213, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
                        ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docker/utils/decorators.py", line 44, in inner
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 236, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/docker/operators/docker.py", line 485, in execute
    if self.force_pull or not self.cli.images(name=self.image):
                              ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/docker/operators/docker.py", line 355, in cli
    return self.hook.api_client
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/docker/hooks/docker.py", line 149, in api_client
    client = APIClient(
             ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docker/api/client.py", line 220, in _retrieve_server_version
    raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
[2024-04-07, 19:10:02 UTC] {taskinstance.py:1149} INFO - Marking task as UP_FOR_RETRY. dag_id=docker_sample, task_id=docker_op_tester, execution_date=20240407T190000, start_date=20240407T191001, end_date=20240407T191002
[2024-04-07, 19:10:02 UTC] {standard_task_runner.py:107} ERROR - Failed to execute job 252 for task docker_op_tester (Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')); 250)
[2024-04-07, 19:10:02 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 1
[2024-04-07, 19:10:02 UTC] {taskinstance.py:3312} INFO - 0 downstream tasks scheduled from follow-on schedule check

What CLI Version did you experience this bug? Astro CLI Version: 1.25.0

This CLI bug is related to which Astronomer Platform?

What Operating System is the above CLI installed on? MacBook Pro (M3) Sonoma 14.4.1 (23E224)

🪜 Steps To Reproduce

astro dev start

Run a DAG that consumes DockerOperator.

cmeans commented 3 months ago

Found a 'work-around'

Ran:

astro dev object export --compose

Added volume entry:

      - /var/run/docker.sock:/var/run/docker.sock:z

to both scheduler and triggerer.

Then:

docker compose up -d

DAG ran just fine:

*** Found local files:
***   * /usr/local/airflow/logs/dag_id=docker_sample/run_id=scheduled__2024-04-07T19:10:00+00:00/task_id=docker_op_tester/attempt=1.log
[2024-04-07, 19:23:34 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:10:00+00:00 [queued]>
[2024-04-07, 19:23:34 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:10:00+00:00 [queued]>
[2024-04-07, 19:23:34 UTC] {taskinstance.py:2193} INFO - Starting attempt 1 of 3
[2024-04-07, 19:23:34 UTC] {taskinstance.py:2217} INFO - Executing <Task(DockerOperator): docker_op_tester> on 2024-04-07 19:10:00+00:00
[2024-04-07, 19:23:34 UTC] {standard_task_runner.py:60} INFO - Started process 200 to run task
[2024-04-07, 19:23:34 UTC] {standard_task_runner.py:87} INFO - Running: ['airflow', 'tasks', 'run', 'docker_sample', 'docker_op_tester', 'scheduled__2024-04-07T19:10:00+00:00', '--job-id', '4', '--raw', '--subdir', 'DAGS_FOLDER/docker_operator_dag.py', '--cfg-path', '/tmp/tmpehzi7c_b']
[2024-04-07, 19:23:34 UTC] {standard_task_runner.py:88} INFO - Job 4: Subtask docker_op_tester
[2024-04-07, 19:23:34 UTC] {task_command.py:423} INFO - Running <TaskInstance: docker_sample.docker_op_tester scheduled__2024-04-07T19:10:00+00:00 [running]> on host 9b01d8238522
[2024-04-07, 19:23:34 UTC] {taskinstance.py:2513} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='docker_sample' AIRFLOW_CTX_TASK_ID='docker_op_tester' AIRFLOW_CTX_EXECUTION_DATE='2024-04-07T19:10:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-04-07T19:10:00+00:00'
[2024-04-07, 19:23:34 UTC] {docker.py:486} INFO - Pulling docker image centos:latest
[2024-04-07, 19:23:35 UTC] {docker.py:500} INFO - latest: Pulling from library/centos
[2024-04-07, 19:23:35 UTC] {docker.py:500} INFO - 52f9ef134af7: Pulling fs layer
[2024-04-07, 19:23:36 UTC] {docker.py:500} INFO - 52f9ef134af7: Downloading
[2024-04-07, 19:23:38 UTC] {docker.py:500} INFO - 52f9ef134af7: Verifying Checksum
[2024-04-07, 19:23:38 UTC] {docker.py:500} INFO - 52f9ef134af7: Download complete
[2024-04-07, 19:23:38 UTC] {docker.py:500} INFO - 52f9ef134af7: Extracting
[2024-04-07, 19:23:40 UTC] {docker.py:500} INFO - 52f9ef134af7: Pull complete
[2024-04-07, 19:23:40 UTC] {docker.py:495} INFO - Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
[2024-04-07, 19:23:40 UTC] {docker.py:495} INFO - Status: Downloaded newer image for centos:latest
[2024-04-07, 19:23:40 UTC] {docker.py:359} INFO - Starting docker container from image centos:latest
[2024-04-07, 19:23:41 UTC] {docker.py:367} WARNING - Using remote engine or docker-in-docker and mounting temporary volume from host is not supported. Falling back to `mount_tmp_dir=False` mode. You can set `mount_tmp_dir` parameter to False to disable mounting and remove the warning
[2024-04-07, 19:24:11 UTC] {taskinstance.py:1149} INFO - Marking task as SUCCESS. dag_id=docker_sample, task_id=docker_op_tester, execution_date=20240407T191000, start_date=20240407T192334, end_date=20240407T192411
[2024-04-07, 19:24:11 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 0
[2024-04-07, 19:24:11 UTC] {taskinstance.py:3312} INFO - 1 downstream tasks scheduled from follow-on schedule check

How can I get this to work using the standard astro dev start command?

sunkickr commented 3 months ago

You can use custom compose file with astro dev start by doing astro dev start --compose-file <custom-file>

cmeans commented 3 months ago

You can use custom compose file with astro dev start by doing astro dev start --compose-file <custom-file>

Great, thanks @sunkickr . So there are no better solutions to this than my approach of a custom compose file?

sunkickr commented 3 months ago

@cmeans yes for now this is the only solution. astro dev start --compose-file <custom-file> is designed to give user the flexibility to do things like this