Barski-lab / cwl-airflow

Python package to extend Airflow functionality with CWL1.1 support
https://barski-lab.github.io/cwl-airflow
Apache License 2.0
185 stars 32 forks source link

Run workflow failed with dockerPull #84

Closed kokleong9406 closed 2 years ago

kokleong9406 commented 2 years ago

I am trying to pull a docker image within a Workflow and failed with error "Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /cwl_airflow/test/39_kokleong_17-docker-wf_manual__2022-05-26T11_23_53.619688_00_00_17sz_laf/run_docker/run_docker_step_cache/da312hvh."

My main workflow file: 17-docker-wf.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: Workflow

hints:
  DockerRequirement:
    # dockerPull: node:slim
    dockerPull: python:3

inputs:
  src:
    type: File

outputs:
  output:
    type: File
    outputSource: run_docker/example_out

steps:
  run_docker:
    run: ./17-docker-clt.cwl
    in: 
      src: src
    out: [example_out]

My CommandLineTool: 17-docker-clt.cwl:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: ["bash", "-c", "pwd; cd ..; pwd; ls -la; cat /etc/passwd"]
hints:
  DockerRequirement:
    dockerPull: node:slim
inputs:
  src:
    type: File
outputs:
  example_out:
    type: stdout
stdout: output.txt

My 17-docker-job.yaml:

src:
  class: File
  path: /cwl_airflow/cwl_inputs_folder/kokleong/test.txt

and the error I got:

scheduler    | [workflow ] start
scheduler    | [workflow ] starting step run_docker
scheduler    | [step run_docker] start
scheduler    | [job run_docker] /cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker_step_cache/iaozsw5n$ docker \
scheduler    |     run \
scheduler    |     -i \
scheduler    |     --mount=type=bind,source=/cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker_step_cache/iaozsw5n,target=/mjHVxY \
scheduler    |     --mount=type=bind,source=/cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker_step_cache/lpdy5uhj,target=/tmp \
scheduler    |     --mount=type=bind,source=/cwl_airflow/cwl_inputs_folder/kokleong/test.txt,target=/var/lib/cwl/stg3e3f7868-861f-4061-b769-b832c9287bd8/test.txt,readonly \
scheduler    |     --workdir=/mjHVxY \
scheduler    |     --read-only=true \
scheduler    |     --log-driver=none \
scheduler    |     --user=0:0 \
scheduler    |     --rm \
scheduler    |     --cidfile=/cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker-20220527011029-898747.cid \
scheduler    |     --env=TMPDIR=/tmp \
scheduler    |     --env=HOME=/mjHVxY \
scheduler    |     node:slim \
scheduler    |     bash \
scheduler    |     -c \
scheduler    |     'pwd; cd ..; pwd; ls -la; cat /etc/passwd' > /cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker_step_cache/iaozsw5n/output.txt
scheduler    | docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /cwl_airflow/cwl_tmp_folder/49_kokleong_17-docker-wf_manual__2022-05-27T01_10_20.160279_00_00_yfo5dws3/run_docker/run_docker_step_cache/iaozsw5n.

In normal circumstance, I would try using cwltool to test the cwl scripts first. Calling "cwltool 17-docker-wf.cwl 17-docker-job" yaml will result in the same error above. So, I call "cwltool --tmpdir-prefix=$(pwd) 17-docker-wf.cwl 17-docker-job" instead, and the error does not occur and I am able to run the workflow successfully

With the idea of changing the "--tmpdir-prefix" when using cwltool, I tried to do similar thing by modifying get_default_cwl_args function in https://github.com/Barski-lab/cwl-airflow/blob/49b63ac7170dfaa231aecdd6e86bd7686719199f/cwl_airflow/utilities/cwl.py, in the "tmpdir_prefix"

    required_cwl_args.update(
        {
            **"tmpdir_prefix": "/cwl_airflow/cwl_tmp_folder",**
            "tmp_folder": get_dir(
                conf_get(
                    "cwl", "tmp_folder",
                    preset_cwl_args.get("tmp_folder", CWL_TMP_FOLDER)
                )
            ),
            "outputs_folder": get_dir(                                             # for CWL-Airflow to store outputs if "outputs_folder" is not overwritten in job
                conf_get(
                    "cwl", "outputs_folder",
                    preset_cwl_args.get("outputs_folder", CWL_OUTPUTS_FOLDER)
                )
            ),
            "inputs_folder": get_dir(                                             # for CWL-Airflow to resolve relative locations for input files if job was loaded from parsed object
                conf_get(
                    "cwl", "inputs_folder",
                    preset_cwl_args.get("inputs_folder", CWL_INPUTS_FOLDER)
                )
            ),
            "pickle_folder": get_dir(                                              # for CWL-Airflow to store pickled workflows
                conf_get(
                    "cwl", "pickle_folder",
                    preset_cwl_args.get("pickle_folder", CWL_PICKLE_FOLDER)
                )
            ),
            "keep_tmp_data": conf_get(                                             # prevents from cleaning dag_run temp data and XCom's in both success or failure scenarios
                "cwl", "keep_tmp_data",
                preset_cwl_args.get("keep_tmp_data", CWL_KEEP_TMP_DATA),
                True                                                               # return Boolean
            ),
            "use_container": conf_get(                                             # execute jobs in docker containers
                "cwl", "use_container",
                preset_cwl_args.get("use_container", CWL_USE_CONTAINER),
                True                                                               # return Boolean
            ),
            "no_match_user": conf_get(                                             # disables passing the current uid to "docker run --user"
                "cwl", "no_match_user",
                preset_cwl_args.get("no_match_user", CWL_NO_MATCH_USER),
                True                                                               # return Boolean
            ),
            "skip_schemas": conf_get(                                              # it looks like this doesn't influence anything in the latest cwltool
                "cwl", "skip_schemas", 
                preset_cwl_args.get("skip_schemas", CWL_SKIP_SCHEMAS),
                True                                                               # return Boolean
            ),
            "strict": conf_get(
                "cwl", "strict", 
                preset_cwl_args.get("strict", CWL_STRICT),
                True                                                               # return Boolean
            ),
            "quiet": conf_get(
                "cwl", "quiet", 
                preset_cwl_args.get("quiet", CWL_QUIET),
                True                                                               # return Boolean
            ),
            "rm_tmpdir": preset_cwl_args.get("rm_tmpdir", CWL_RM_TMPDIR),          # even if we can set it in "preset_cwl_args" it's better not to change
            "move_outputs": preset_cwl_args.get("move_outputs", CWL_MOVE_OUTPUTS), # even if we can set it in "preset_cwl_args" it's better not to change
            "enable_dev": preset_cwl_args.get("enable_dev", CWL_ENABLE_DEV)        # fails to run without it when creating workflow from tool. TODO: Ask Peter?
        }
    )

It seems like tmpdir_prefix is somehow using the value of tmp_folder.

Generally, do you know how to make dockerPull work?

michael-kotliar commented 2 years ago

Hi @kokleong9406, Could you please check if /cwl_airflow/test/39_kokleong_17-docker-wf_manual__2022-05-26T11_23_53.619688_00_00_17sz_laf/run_docker/run_docker_step_cache/da312hvh. actually exists? It looks like some permissions problems. Make sure your Docker has read/write access to the /cwl_airflow directory. If you are running it on Mac, you may configure it in the Preferences/Resources/File Sharing of the Docker Desktop application.

kokleong9406 commented 2 years ago

Hi @michael-kotliar

[Updated my 1st comment to reflect the latest error I'm getting] Yes, Docker has read/write access to the /cwl_airflow directory. And yes, the destination directory does exist, with output.txt file in it, but its file content is empty.

I think the problem could be the directory does not exist yet when volume mapping is performed in the docker container?

kokleong9406 commented 2 years ago

By the way, I forgot to mention that, I actually ran the cwl-airflow in a docker container, namely "webserver". I am able to run any workflows successfully, except when there is DockerRequirement. Also, I tried to "docker exec -ti webserver bash" into the docker container, I also face similar error when I do "cwltool --tmpdir-prefix=$(pwd)/tmp/ 17-docker-wf.cwl ../../../cwl_inputs_folder/kokleong/58_kokleong_17-docker-wf/17-docker-job.yaml"

The weird thing is, if I were to use cwltool to run it on my host machine, providing the argument "--tmpdir-prefix" will work. But if I were to run the same thing in my docker container, I face the error.

michael-kotliar commented 2 years ago

Well, that's an important detail :) You can't run docker inside a docker unless you start the 'external' container with mounted /var/run/docker.sock:/var/run/docker.sock. However, in this case, the locations of all mounted to the 'internal' container files and folders will look as if they are mounted to the 'external' container. I have an example of how you can make it work in this docker-compose file https://github.com/Barski-lab/cwl-airflow/tree/master/packaging/docker_compose/local_executor

kokleong9406 commented 2 years ago

Yes, I included "/var/run/docker.sock:/var/run/docker.sock" in the docker-compose file.

Also, I managed to make it run successfully. What I did was modifying the lines in https://github.com/common-workflow-language/cwltool/blob/main/cwltool/docker.py, specifically:

    @staticmethod
    def append_volume(
        runtime: List[str], source: str, target: str, writable: bool = False
    ) -> None:
        """Add binding arguments to the runtime list."""
        options = [
            "type=bind",
            "source=" + source,
            "target=" + target,
        ]
        if not writable:
            options.append("readonly")
        output = StringIO()
        csv.writer(output).writerow(options)
        mount_arg = output.getvalue().strip()
        runtime.append(f"--mount={mount_arg}") # I change this to runtime.append(f"--volume={mount_arg}")
        # Unlike "--volume", "--mount" will fail if the volume doesn't already exist.
        if not os.path.exists(source):
            os.makedirs(source)

By the way, the reason of me even wanting to run workflows in container stem from the reason that, if a workflow is run in the "internal container" within the cwl-airflow "external container", I am essentially isolating all the environments, files and directories of the "external container" from being visible to the "internal container". This way, a user cannot construct any malicious CommandLinTool or Workflow that wreak havoc on the "external container". Any one who chance upon the need for enhanced security, feel free to comment on my remarks here. I am still testing whether what I did is secure