I removed the code for the original rule escape_sequence because I didn't see where the escape sequences came from and they didn't seem to be special to docker, based on my own experimentation. I used the following setup for checking docker's behavior:
$ cat Dockerfile
FROM busybox
ENV FOO="ba\\r\"\'\n\xAA\u1234"
RUN echo "$FOO" > file
CMD ["cat", "file"]
$ docker build -t foo . && docker run -it foo
[+] Building 0.6s (6/6) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 125B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.0s
=> CACHED [1/2] FROM docker.io/library/busybox 0.0s
=> [2/2] RUN echo "ba\r"\'\n\xAA\u1234" > file 0.5s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:6a6e1b28064206cb5da033ce7086c138fd868050577e4 0.0s
=> => naming to docker.io/library/foo 0.0s
ba\r"\'\n\xAA\u1234
^ shows which backslashes were part of an escape sequence and which weren't.
Unlike JSON strings, the single-quoted and non-JSON double-quoted strings are tolerant to stray backslashes and will parse correctly even if we're not recognizing explicitly some valid escape sequences (e.g. "\n" will parse correctly regardless of whether this represents LF or backslash-n).
This adds support for single-quoted strings and distinguishes JSON strings from the double-quoted strings that support variable expansion.
Fixes https://github.com/camdencheek/tree-sitter-dockerfile/issues/36 (which is needed by https://github.com/returntocorp/semgrep/issues/7780)
I removed the code for the original rule
escape_sequence
because I didn't see where the escape sequences came from and they didn't seem to be special todocker
, based on my own experimentation. I used the following setup for checkingdocker
's behavior:^ shows which backslashes were part of an escape sequence and which weren't.
Unlike JSON strings, the single-quoted and non-JSON double-quoted strings are tolerant to stray backslashes and will parse correctly even if we're not recognizing explicitly some valid escape sequences (e.g.
"\n"
will parse correctly regardless of whether this represents LF or backslash-n).