Closed capnspacehook closed 2 years ago
You're right that in the Caddyfile, only the last path_regexp
matcher will apply. I agree we should make that an explicit error to be clear that it's bad config.
To use multiple regexps, you can use the expression
matcher, which has path_regexp
as a function. But I'm not sure I understand what exactly you're trying to match.
@dockerAPI expression `method('GET') && path_regexp('^/v\d\.\d+/*') && path_regexp('^/asdf/*')`
Keep in mind you could also just use a regular path
matcher for a prefix match, like path /asdf/*
which won't conflict with the other path_regexp
since it's a matcher of a different type. But it will be an AND
between each in a named matcher. If you need an OR
then using expression
is easier.
Also in v2.6, you'll be able to omit the expression
token since we implemented a shortcut:
@dockerAPI `method('GET') && path_regexp('^/v\d\.\d+/*') && path_regexp('^/asdf/*')`
You're right that in the Caddyfile, only the last path_regexp matcher will apply. I agree we should make that an explicit error to be clear that it's bad config.
Good to hear!
To use multiple regexps, you can use the expression matcher, which has path_regexp as a function. But I'm not sure I understand what exactly you're trying to match.
Thanks for the help! Note for future readers backslashes have to be escaped to work correctly when using path_regexp
in an expression
, ie
@dockerAPI expression `method('GET') && path_regexp('^/v\\d\\.\\d+/*')`
will work.
Sorry, I tried to make my example as simple as possible to reproduce which made it not make much sense as an actual config. I'm trying to match a few Docker API endpoints, namely:
/v1.41/events*
/v1.41/containers/json*
/v1.41/containers/<some_hash>/json
Where the /v1.41
prefix can change.Is using a long expression
the only way to match on multiple path_regexp
s at once?
Is using a long expression the only way to match on multiple path_regexps at once?
Yes, which is a limitation of the JSON config's design. Run caddy adapt --pretty
on your config, and you'll see that path_regexp
is an object key in a matcher set, which means there can only matcher of each type, and unfortunately the value of the matcher is an object and not an array so there can only be one value. Other matchers like path
take an array of paths so it can OR all of the input paths.
The reason multiple regexes aren't supported is because you can write multiple regexes as one regex. Matchers in a set are ANDed together, which logic can be expressed in regex anyway. For example in your case:
@dockerAPI path_regexp /v\d+.\d+/(events.*|containers/json.*|containers/.*/json)
i.e. you don't need multiple regexes.
@mholt I understand that, I was just looking to use multiple path_regexp
s to make the Caddyfile more readable/maintainable. I only want to allow 3 API endpoints now, but in the future if I want to allow 10, or 20 and block the rest... the regex could get very ugly.
No worries if using one long regex is the only way to accomplish this, I understand that only allowing one path regex is a sane rule. I guess I'm trying to use Caddy like a request filterer rather than as a webserver/reverse proxy, which AFAIK isn't a generally supported use-case. If it is though, as I mentioned before it would be nice to be able to specify multiple path regexes in a named matcher to help maintainability and readability of complex URL filtering logic.
Not sure if it's possible to concatenate multiple regex internally?
@kadefor rather not go down that rabbithole 😅 there's things like flags that must go in the front, etc. I don't want to write a regexp parser for regexp 🤣
@francislavoie Haha, no need to write a regex parser, just like this:
path_regexp aaa
path_regexp bbb
path_regexp ccc
concat to:
path_regexp (abc)|(bbb)|(ccc)
The performance may not be very good, but it is an idea after all, an immature idea. 😅
However, the "expression" is very powerful! 😄
That wouldn't be possible for a regexp like (?i)^(/api.*)
because the case insensitive flag must be at the start of the regexp, it can't appear in the middle. And if you're using un-named capture groups, appending them together will change the output's number (I think). This is just a whole can of worms that I don't thing is worth opening.
Could 'path_regexp' be treated like 'path' matches, where if multiple regexes were specified each regex would be compiled and used separately? When the 'path_regxp' s would be matched each regex would be matched against the input in a loop, how 'path' is implemented. That would negate problems with combining multiple regexes I'd think.
If you think that'd be feasible I'd be willing to draft a PR for it.
I don't know how that would work when you have multiple capture groups, it's like, you wouldn't know which regex matched, so you wouldn't know which placeholders to use to access the values you wanted.
And like I said earlier, the JSON config doesn't allow for multiple patterns. Changing that would be a breaking change and I don't think we want to do that right now.
I think using CEL expressions for this is the best option for sure. It's much easier to write now with all the recent changes we've made. It sidesteps all the complexity of what you're suggesting, and it keeps the path_regexp matcher simple. And that has a lot of value.
Caddy version: v2.6.0-beta.3 GOOS: linux GOARCH: amd64
I found some interesting behavior where specifying multiple
path_regexp
matchers in a named matcher will not error when the config file is parsed. Instead, the lastpath_regexp
matcher will be used and the previous matchers will silently be discarded.Example Caddyfile:
In this example, requests that begin with
/v1.41/
do not match@dockerAPI
, but requests that begin with/asdf/
do. I think an error should be returned when parsing the config file.As a side note, I discovered this while trying to use Caddy to limit access to a Docker API socket, I only want certain API endpoints to be allowed and passed through by Caddy. It's difficult to do such a thing without the ability to use multiple
path_regexp
matchers, is there a workaround for this or should I be using a 3rd party module/a separate tool entirely?