caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.1k stars 3.99k forks source link

Using multiple path_regexp matchers does not error #5028

Closed capnspacehook closed 2 years ago

capnspacehook commented 2 years ago

Caddy version: v2.6.0-beta.3 GOOS: linux GOARCH: amd64

I found some interesting behavior where specifying multiple path_regexp matchers in a named matcher will not error when the config file is parsed. Instead, the last path_regexp matcher will be used and the previous matchers will silently be discarded.

Example Caddyfile:

http://localhost:9000 {
    log {
        output stdout
        level INFO
    }

    @dockerAPI {
        method GET
        path_regexp ^/v\d\.\d+/*
        path_regexp ^/asdf/*
    }

    handle @dockerAPI {
        reverse_proxy unix//var/run/docker.sock
    }
    handle {
        respond 427
    }
}

In this example, requests that begin with /v1.41/ do not match @dockerAPI, but requests that begin with /asdf/ do. I think an error should be returned when parsing the config file.

As a side note, I discovered this while trying to use Caddy to limit access to a Docker API socket, I only want certain API endpoints to be allowed and passed through by Caddy. It's difficult to do such a thing without the ability to use multiple path_regexp matchers, is there a workaround for this or should I be using a 3rd party module/a separate tool entirely?

francislavoie commented 2 years ago

You're right that in the Caddyfile, only the last path_regexp matcher will apply. I agree we should make that an explicit error to be clear that it's bad config.

To use multiple regexps, you can use the expression matcher, which has path_regexp as a function. But I'm not sure I understand what exactly you're trying to match.

@dockerAPI expression `method('GET') && path_regexp('^/v\d\.\d+/*') && path_regexp('^/asdf/*')`

Keep in mind you could also just use a regular path matcher for a prefix match, like path /asdf/* which won't conflict with the other path_regexp since it's a matcher of a different type. But it will be an AND between each in a named matcher. If you need an OR then using expression is easier.

Also in v2.6, you'll be able to omit the expression token since we implemented a shortcut:

@dockerAPI `method('GET') && path_regexp('^/v\d\.\d+/*') && path_regexp('^/asdf/*')`
capnspacehook commented 2 years ago

You're right that in the Caddyfile, only the last path_regexp matcher will apply. I agree we should make that an explicit error to be clear that it's bad config.

Good to hear!

To use multiple regexps, you can use the expression matcher, which has path_regexp as a function. But I'm not sure I understand what exactly you're trying to match.

Thanks for the help! Note for future readers backslashes have to be escaped to work correctly when using path_regexp in an expression, ie

@dockerAPI expression `method('GET') && path_regexp('^/v\\d\\.\\d+/*')`

will work.

Sorry, I tried to make my example as simple as possible to reproduce which made it not make much sense as an actual config. I'm trying to match a few Docker API endpoints, namely:

Is using a long expression the only way to match on multiple path_regexps at once?

francislavoie commented 2 years ago

Is using a long expression the only way to match on multiple path_regexps at once?

Yes, which is a limitation of the JSON config's design. Run caddy adapt --pretty on your config, and you'll see that path_regexp is an object key in a matcher set, which means there can only matcher of each type, and unfortunately the value of the matcher is an object and not an array so there can only be one value. Other matchers like path take an array of paths so it can OR all of the input paths.

mholt commented 2 years ago

The reason multiple regexes aren't supported is because you can write multiple regexes as one regex. Matchers in a set are ANDed together, which logic can be expressed in regex anyway. For example in your case:

@dockerAPI path_regexp /v\d+.\d+/(events.*|containers/json.*|containers/.*/json)

i.e. you don't need multiple regexes.

capnspacehook commented 2 years ago

@mholt I understand that, I was just looking to use multiple path_regexps to make the Caddyfile more readable/maintainable. I only want to allow 3 API endpoints now, but in the future if I want to allow 10, or 20 and block the rest... the regex could get very ugly.

No worries if using one long regex is the only way to accomplish this, I understand that only allowing one path regex is a sane rule. I guess I'm trying to use Caddy like a request filterer rather than as a webserver/reverse proxy, which AFAIK isn't a generally supported use-case. If it is though, as I mentioned before it would be nice to be able to specify multiple path regexes in a named matcher to help maintainability and readability of complex URL filtering logic.

kadefor commented 2 years ago

Not sure if it's possible to concatenate multiple regex internally?

francislavoie commented 2 years ago

@kadefor rather not go down that rabbithole 😅 there's things like flags that must go in the front, etc. I don't want to write a regexp parser for regexp 🤣

kadefor commented 2 years ago

@francislavoie Haha, no need to write a regex parser, just like this:

    path_regexp aaa
    path_regexp bbb
    path_regexp ccc

concat to:

    path_regexp (abc)|(bbb)|(ccc)

The performance may not be very good, but it is an idea after all, an immature idea. 😅

However, the "expression" is very powerful! 😄

francislavoie commented 2 years ago

That wouldn't be possible for a regexp like (?i)^(/api.*) because the case insensitive flag must be at the start of the regexp, it can't appear in the middle. And if you're using un-named capture groups, appending them together will change the output's number (I think). This is just a whole can of worms that I don't thing is worth opening.

capnspacehook commented 2 years ago

Could 'path_regexp' be treated like 'path' matches, where if multiple regexes were specified each regex would be compiled and used separately? When the 'path_regxp' s would be matched each regex would be matched against the input in a loop, how 'path' is implemented. That would negate problems with combining multiple regexes I'd think.

If you think that'd be feasible I'd be willing to draft a PR for it.

mholt commented 2 years ago

I don't know how that would work when you have multiple capture groups, it's like, you wouldn't know which regex matched, so you wouldn't know which placeholders to use to access the values you wanted.

francislavoie commented 2 years ago

And like I said earlier, the JSON config doesn't allow for multiple patterns. Changing that would be a breaking change and I don't think we want to do that right now.

I think using CEL expressions for this is the best option for sure. It's much easier to write now with all the recent changes we've made. It sidesteps all the complexity of what you're suggesting, and it keeps the path_regexp matcher simple. And that has a lot of value.