camdencheek / tree-sitter-dockerfile

A tree-sitter grammar for Dockerfile
MIT License
71 stars 20 forks source link

Distinguish 3 kinds of quoted strings (single, double, JSON) #37

Closed mjambon closed 1 year ago

mjambon commented 1 year ago

This adds support for single-quoted strings and distinguishes JSON strings from the double-quoted strings that support variable expansion.

Fixes https://github.com/camdencheek/tree-sitter-dockerfile/issues/36 (which is needed by https://github.com/returntocorp/semgrep/issues/7780)

I removed the code for the original rule escape_sequence because I didn't see where the escape sequences came from and they didn't seem to be special to docker, based on my own experimentation. I used the following setup for checking docker's behavior:

$ cat Dockerfile 
FROM busybox
ENV FOO="ba\\r\"\'\n\xAA\u1234"
RUN echo "$FOO" > file
CMD ["cat", "file"]

$ docker build -t foo . && docker run -it foo
[+] Building 0.6s (6/6) FINISHED                                                
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 125B                                       0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load metadata for docker.io/library/busybox:latest          0.0s
 => CACHED [1/2] FROM docker.io/library/busybox                            0.0s
 => [2/2] RUN echo "ba\r"\'\n\xAA\u1234" > file                            0.5s
 => exporting to image                                                     0.0s
 => => exporting layers                                                    0.0s
 => => writing image sha256:6a6e1b28064206cb5da033ce7086c138fd868050577e4  0.0s
 => => naming to docker.io/library/foo                                     0.0s
ba\r"\'\n\xAA\u1234

^ shows which backslashes were part of an escape sequence and which weren't.

Unlike JSON strings, the single-quoted and non-JSON double-quoted strings are tolerant to stray backslashes and will parse correctly even if we're not recognizing explicitly some valid escape sequences (e.g. "\n" will parse correctly regardless of whether this represents LF or backslash-n).

mjambon commented 1 year ago

I'm done with my changes.

mjambon commented 1 year ago

Can you please merge the PR? I don't have the permission despite the approval.

camdencheek commented 1 year ago

Thanks for the updates, and sorry for the delay!