INCATools / ontology-development-kit

Bootstrap an OBO Library ontology
http://incatools.github.io/ontology-development-kit/
BSD 3-Clause "New" or "Revised" License
212 stars 53 forks source link

Spaces in path causes `docker run` error in `run.sh` #1078

Open joeflack4 opened 2 days ago

joeflack4 commented 2 days ago

Overview

I had an ODK repository with spaces in the path (https://github.com/monarch-initiative/mondo-ingest/issues/545), and this resulted in an error when using run.sh.

The error

docker: invalid reference format

Log

``` joeflack4@MacBook-Pro ontology % sh run.sh make build-mondo-ingest Running obolibrary/odkfull:dev with '-Xmx20G' as options for ROBOT and other Java-based pipeline steps. docker: invalid reference format. See 'docker run --help'. ```

Replicability

  1. Try on an ODK rep with a path that includes a space in it, e.g. this example with dir mondo builds/, error:

    joeflack4@MacBook-Pro ontology % pwd
    /Users/joeflack4/Desktop/mondo builds/mondo-ingest-build-ordo-subset/src/ontology
    oeflack4@MacBook-Pro ontology % sh run.sh make build-mondo-ingest
    Running obolibrary/odkfull:dev with '-Xmx20G' as options for ROBOT and other Java-based pipeline steps.
    docker: invalid reference format.
    See 'docker run --help'.
  2. Remove the space, e.g. in this example changing the dir name to mondo-builds/, and observe the issue disappear.

    joeflack4@MacBook-Pro ontology % pwd
    /Users/joeflack4/Desktop/mondo-builds/mondo-ingest-build-ordo-subset/src/ontology
    joeflack4@MacBook-Pro ontology % sh run.sh make build-mondo-ingest
    Running obolibrary/odkfull:dev with '-Xmx20G' as options for ROBOT and other Java-based pipeline steps.
    mondo-ingest.Makefile:65: warning: overriding recipe for target 'components/omim.owl'

Additional information

Original context:

gouttegd commented 2 days ago

TL;DR: We can’t support spaces in paths while remaining compatible with a POSIX shell.

The interesting bit in the run.sh script is here:

BIND_OPTIONS="-v $(echo $VOLUME_BIND | sed 's/,/ -v /')"
docker run $ODK_DOCKER_OPTIONS $BIND_OPTIONS [...]

The first line transforms the VOLUME_BIND variable from a comma-separated list of pairs of directories to bind (HOST_DIRECTORY1:CONTAINER_DIRECTORY1,HOST_DIRECTORY2:CONTAINER_DIRECTORY2,...) into a list of -v options as expected by Docker (-v HOST_DIRECTORY1:CONTAINER_DIRECTORY1 -v HOST_DIRECTORY2:CONTAINER_DIRECTORY2 ...). On the second line, the transformed BIND_OPTIONS variable is passed to the docker run command.

This supposes that the arguments to the -v options (all HOST_DIRECTORY:CONTAINER_DIRECTORY pairs) never contain any space characters.

If HOST_DIRECTORY happens to contain a space character (say, HOST DIRECTORY), then we end up with -v HOST DIRECTORY:CONTAINER_DIRECTORY, where only HOST is interpreted as the argument to -v, while DIRECTORY:CONTAINER_DIRECTORY is a completely independent positional argument (that docker run will interpret as the name of the image to run).

Enclosing $BIND_OPTIONS in quotes (to prevent the shell from splitting it on the space characters) is not an option here, because it will prevent the shell from splitting it on the space characters, and we need the shell to do that because the HOST_DIRECTORY:CONTAINER_DIRECTORY pairs in BIND_OPTIONS must be passed to docker run as separate arguments.

One option is to use an array variable:

BIND_OPTIONS="-v $(echo $VOLUME_BIND | sed 's/,/,-v/')"
IFS=, BIND_OPTIONS_ARRAY=(BIND_OPTIONS)
unset IFS
docker run $ODK_DOCKER_OPTIONS "${BIND_OPTIONS_ARRAY[@]}" [...]

The first line transforms HOST_DIRECTORY1:CONTAINER_DIRECTORY1,HOST_DIRECTORY2:CONTAINER_DIRECTORY2,... into -vHOST_DIRECTORY1:CONTAINER_DIRECTORY1,-vHOST_DIRECTORY2:CONTAINER_DIRECTORY2,... (that is, it simply inserts a -v before each pair).

Then we ask the shell to create an array (this is the non-POSIX-compliant part; the POSIX shell specification has no concept of array variables) by splitting the BIND_OPTIONS variable on the , character, by setting the word splitting parameter (IFS) to ,.

Then we need to reset the IFS so that the shell will resume to splitting lines on space characters as normal.

And then, finally, we pass the array to the docker run command. Here we can enclose it within double quotes, because when used around a reference to an array the shell will actually put quotes around each item in the array, rather than around the entire array. That is, we will end up with "-vHOST_DIRECTORY1:CONTAINER_DIRECTORY1" "-vHOST_DIRECTORY2:CONTAINER_DIRECTORY2", which is exactly what we need and will work as expected even if there are space characters somewhere in the directory paths.

joeflack4 commented 2 days ago

It might take me some time to fully read and process this, but that looks like a really thorough investigation and explanation!

If this can't be fixed, then I would suggest an alternative solution: graceful error. If detecting a space in the path, we throw an error saying to correct that; that it is incompatible.

That would help a lot. It took me a while to figure out what the problem was. Can we do this?

gouttegd commented 2 days ago

If this can't be fixed

It can be fixed if we decide to drop the requirement for the run.sh script to be POSIX-compliant – something that, for the record, I am in favour of.

The more I think about it, the less I am convinced that POSIX compatibility makes sense for the ODK, and especially for the Docker wrapper script. That thing is entirely dependent on, well, Docker, which is already a Linux-specific technology in the first place (it works on Windows and macOS only through virtualisation of a Linux machine).

If this can't be fixed, then I would suggest an alternative solution: graceful error. If detecting a space in the path, we throw an error saying to correct that; that it is incompatible.

Until/unless we make a decision to drop POSIX compatibility as a requirement, yes, we can do that and I agree it would be a good thing to do.