Inconsistency in docs: variable names and substitution rules

Description

The documentation is unclear on a few points about how ARG build arguments and ENV environment variables are treated differently in terms of variable substitution (interpolation).

The Dockerfile reference documentation says:

Environment replacement
... Environment variables are supported by the following list of instructions in the Dockerfile: [list of instructions that does not include RUN]

First, this is unclear about whether it's only talking about ENV environment variables (as it seems to) or if it also applies to ARG build arguments.

Second, it also says:

Using ARG variables
You can use an ARG or an ENV instruction to specify variables that are available to the RUN instruction.

This seems to directly contradict the absence of RUN in the "Environment replacement" list. Perhaps it is meant that variable expansion of ENV variables will be performed by the shell under which the RUN instruction command executes, rather than by Docker itself. But it is not guaranteed that the shell performs such expansion.

The documentation also gives an example in which an ARG is used in a RUN instruction, suggesting that its absence from the "Environment replacement" list above is indeed an omission or something else.


FROM busybox
ARG SETTINGS
RUN ./run/setup $SETTINGS

Similarly, the "Environment replacement" list says FROM supports expansion of environment variables, but there is a section specifically talking about its behavior with build arguments.

There are some weird edge cases with ARG and ENV like how ENV declarations shadow prior ARG, and how ARG has to be repeated after a FROM, but before raising these issues the documentation should be much clearer on how to use them in the first place.

Output of docker version:

docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:20 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:10 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
  compose: Docker Compose (Docker Inc., v2.0.0-rc.3)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 20
 Server Version: 20.10.8
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e25210fe30a0a703442421b0f60afac609f950a3
 runc version: v1.0.1-0-g4144b63
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.47-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 1.939GiB
 Name: docker-desktop
 ID: XBSX:EJU6:QSRV:CFL4:MYOE:F77V:GBOU:4F63:XLBZ:ZQUR:RS4N:LDLP
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Thanks for reporting!

First, this is unclear about whether it's only talking about ENV environment variables (as it seems to) or if it also applies to ARG build arguments.

Yes, this should be better clarified. ENV and ARG both act in the same space, and are both set as environment variables for RUN instructions. The difference is that ENV is persisted in the image, whereas ARG are only available during docker build and their values are discarded afterwards (so they don't persist in the image, unless their value is copied to an ENV).

This seems to directly contradict the absence of RUN in the "Environment replacement" list. Perhaps it is meant that variable expansion of ENV variables will be performed by the shell under which the RUN instruction command executes, rather than by Docker itself. But it is not guaranteed that the shell performs such expansion.

Yes, that is correct; variable expansion in RUN, ENTRYPOINT and CMD instructions is performed by the shell (or other processes that executes the RUN); expansion is performed the moment those instructions run, which, for RUN instructions, is during the docker build, and for ENTRYPOINT and CMD when a container is started from the image. I wrote a reply on https://github.com/moby/moby/issues/42937#issuecomment-945637536, which is also around this issue.

The documentation also gives an example in which an ARG is used in a RUN instruction, suggesting that its absence from the "Environment replacement" list above is indeed an omission or something else.

In that example, the shell performs the variable expansion. The RUN in the example;

FROM busybox
ARG SETTINGS
RUN ./run/setup $SETTINGS

Can be read as:

SETTINGS=<value of settings ARG> /bin/sh -c './run/setup $SETTINGS'

Similarly, the "Environment replacement" list says FROM supports expansion of environment variables, but there is a section specifically talking about its behavior with build arguments.

Agreed, that looks incorrect; substitution rules are the same, but it would only expand them if they're set as ARG, and only ARG will be substituted in FROM.

There are some weird edge cases with ARG and ENV like how ENV declarations shadow prior ARG

If I'm not mistaken, later values should override former values if there's an overlap between ENV and ARG (both setting the same variable name);

FROM busybox
ENV FOO1=env-foo1
ENV FOO2=env-foo2
ENV FOO3=env-foo3
ARG FOO4=arg-foo4-default
ARG FOO5
ENV FOO6=env-foo6
ARG FOO1=arg-foo1-default
ARG FOO2=arg-foo2-default
ARG FOO3
ENV FOO4=env-foo4
ENV FOO5=env-foo5
ARG FOO7=arg-foo7-default
ENV FOO7=copied-from-$FOO7
RUN echo FOO1 is: $FOO1; echo FOO2 is: $FOO2; echo FOO3 is: $FOO3; echo FOO4 is: $FOO4; echo FOO5 is: $FOO5; echo FOO6 is: $FOO6; echo FOO7 is: $FOO7;

With BuildKit enabled;

DOCKER_BUILDKIT=1 docker build -t foo1 --build-arg FOO1=cli-foo1 --build-arg FOO4=cli-foo4 --no-cache --progress=plain .
...
#5 0.200 FOO1 is: cli-foo1
#5 0.200 FOO2 is: arg-foo2-default
#5 0.200 FOO3 is: env-foo3
#5 0.200 FOO4 is: env-foo4
#5 0.200 FOO5 is: env-foo5
#5 0.200 FOO6 is: env-foo6
#5 0.200 FOO7 is: copied-from-arg-foo7-default

However, behavior with the classic builder, due to limitations of that builder, is slightly different during the build:

DOCKER_BUILDKIT=0 docker build -t foo2 --build-arg FOO1=cli-foo1 --build-arg FOO4=cli-foo4 --no-cache .
...
FOO1 is: env-foo1
FOO2 is: env-foo2
FOO3 is: env-foo3
FOO4 is: env-foo4
FOO5 is: env-foo5
FOO6 is: env-foo6
FOO7 is: copied-from-arg-foo7-default

(Environment variables in the resulting image should be the same in both cases though);

docker image inspect --format='{{json .Config.Env}}' foo2 | jq .
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "FOO1=env-foo1",
  "FOO2=env-foo2",
  "FOO3=env-foo3",
  "FOO6=env-foo6",
  "FOO4=env-foo4",
  "FOO5=env-foo5",
  "FOO7=copied-from-arg-foo7-default"
]

In general, the recommendation should be to prevent using the same name for ARG and ENV to prevent ambiguity, but agreed that the behavior should be documented more in-depth.

and how ARG has to be repeated after a FROM, but before raising these issues the documentation should be much clearer on how to use them in the first place.

As a general thumb of rule, FROM (and COPY --from) can use a ARG that is defined in the global scope (before the first FROM). Instructions within a build stage only have access to ARG defined within that stage (within the lines following the FROM that starts the stage). Some discussion around improving the docs around this can be found in https://github.com/moby/moby/issues/40830#issuecomment-622949605. That discussion does not yet include BuildKit's feature that allows "splitting" a stage, for example, with BuildKit, this is possible;

FROM busybox AS base
ARG FOO=bar

FROM base AS breakpoint1
RUN echo FOO is: $FOO

FROM breakpoint1 AS breakpoint2
RUN echo FOO is: $FOO

docker / cli

Inconsistency in docs: variable names and substitution rules #3323