compose-spec / compose-go

Reference library for parsing and loading Compose YAML files
https://compose-spec.io
Apache License 2.0
345 stars 109 forks source link

dotenv octal escape unexpected behavior #650

Open panzi opened 1 month ago

panzi commented 1 month ago

The regular expression for octal escape sequences matches too much (0\d{0,3} instead of 0[0-7]{0,3}, although that is still too much), replaces a \0 prefix with just \, and then if the unquoting of the escape sequence fails it inserts the manipulated match.

Meaning this value: "\079" Gives this string: "\\79" While it should give: "\x079" (bytes: [ 0x07, 0x39 ])

I.e. this are two bugs. Using the manipulated match when unquoting fails and matching too much and thus failing valid octal escape sequences.

Edit: Also through trail and error I found out that strconv.UnquoteChar() wants octal escape sequences to be exactly 3 octal numbers long, meaning the regular expression should actually be 0[0-7]{3}, or the match needs to be 0-padded to 3 characters long.

Further the regular expression also matches \c, which I can't find in the Go spec.

https://github.com/compose-spec/compose-go/blob/9d0d133e13d0955e27520c6317d08822b7c5de5f/dotenv/parser.go#L19 https://github.com/compose-spec/compose-go/blob/9d0d133e13d0955e27520c6317d08822b7c5de5f/dotenv/parser.go#L225

ndeloof commented 1 month ago

I'm not confident with regex and the way those are used in this context, so hardly can tell what would be the adequate expression Would you like to help us fix this by writing a pull-request with a test case covering those various cases?

panzi commented 1 month ago

I've never used Go before looking at this. I just was bored and looked at the differences of various dotenv implementations and tried to replicate them in Rust, just for fun. I noticed these odd behaviors when my solution didn't produce the same results in some cases (I now emulate the behavior 1:1). I wrote down everything I noticed here: https://github.com/panzi/punktum#composego-dialect I also mention bugs in the variable substitution there, but didn't describe it well enough and now I can't remember the details. I think there's a problem in multi-line strings when the default value part spans multiple lines.

Also I just noticed yet another bug. If the inheritance syntax is used in the last line of a .env file and the file isn't newline terminated, then the parser will think the parsed key is the empty string and the value is the variable name. I.e. this .env file (given that there is no newline at the end):

FOO

Is equivalent to this JSON:

{ "": "FOO" }

Anyway, I wouldn't use regular expressions at all in implementing this, and neither would I use a library function for the escape sequences. Also I wouldn't do anything like escape sequence handling or variable substitution in separate passes. I find it simpler to think of when it is done in one pass. Because then the output of one pass doesn't need to prepare the string so that is correctly handled in the next (see $ handling). And about variable substitution: Yet another thing I wouldn't do with regular expressions, but in fact with a proper recursive parser, since it is a recursive syntax.

So because I never used Go before, am not a fan of what I've seen of Go, and am very opinionated on how such a parser should be written, I'm probably not the right person to contribute code here. XD