caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
58.42k stars 4.04k forks source link

Feature Request: CEL method for `env` placeholder #6584

Closed polarathene closed 1 month ago

polarathene commented 1 month ago

Short version: The ability to use placeholders with env to lookup different ENV based on placeholder value at the time, or a CEL function for matchers to accomplish the same.


From what I understand (at least with Caddyfile support), an explicit ENV var can be used and is parsed with the Caddyfile, or env placeholder can be used, similar to file that does a lookup at runtime via method instead.

Snippets can be used with args to compose the value for env or file, but the inputs must be known at the time, you could not use vars or path placeholders for example.

My use case was for matching against content sourced from an ENV, but with:

This could be achieved either via a CEL equivalent method where the value can be provided as a string using CEL like the above path example, or if the existing placeholder support could handle {env.{vars.example}}.

I haven't tried with map directive yet, but expect that would be a similar issue. Presumably the only option with Caddy for this right now is snippets and creating an import with args for each permutation upfront. Alternatively writing a custom caddy plugin or a separate service to perform the check (to allow or deny a route).

The intent was to leverage caddy's existing features in a way that a docker image could be deployed and users only needed to manage some permissions via ENV (_such as for path_regexp matchers to refer to_).

francislavoie commented 1 month ago

The vars matcher is able to read placeholders. For example:

:80 {
    @var vars {env.USER} francis
    respond @var "hey francis"
    respond "nope"
}

So I think really the request is moreso "provide the vars matcher in CEL".

I couldn't remember why we didn't implement it, so I went looking. For ref https://github.com/caddyserver/caddy/pull/4715#issue-1207649128 and https://github.com/caddyserver/caddy/pull/4715#issuecomment-1163585563.

TL;DR the reason is that since we do a fancy regexp replacement for placeholders in CEL already to turn them into a magic function call, having a matcher that takes a placeholder as input is hard from a parsing standpoint.

Will need to do some thinking to look again whether we can add vars to CEL, might need a bit of reworking of the regexp replacement for placeholders to allow an escape \{ at the start for use as a regular string in matcher inputs.

francislavoie commented 1 month ago

I probably have an approach that could work now, see #6594, we'll be able to escape placeholders, so vars will be able to take placeholder strings as input as long as they're escaped. This'll be weird to document though cause it's so quirky.

francislavoie commented 1 month ago

Alright vars is implemented in https://github.com/caddyserver/caddy/pull/6594 so that should do what you need. You can do like:

:80 {
    vars example USER
    @var `vars({"{env." + {vars.example} + "}": "francis"})`
    respond @var "hey francis"
    respond "nope"
}

So similarly to the previous example, this pulls the USER env doing string concat to assemble a placeholder. Kinda weird syntax but it works.

I gotta say tho ⚠ you gotta be extra careful if you use a pattern like this, make sure to not allow any user-controlled input in your vars cause it could cause bad things to happen (e.g. invoking {file.*} which could read system files that shouldn't be read).

polarathene commented 1 month ago

I built your branch and tried to leverage it but was rather unsuccessful :(

# Not supported?:
# GET `/caddy` => `{env.USER_CADDY}` => `francis` (via ENV `USER_CADDY=francis`)
@var `vars({"{env.USER_" + {path.0}.upperAscii() + "}": "francis"})`
# Unclear how to escape, or if this is even supported?:
@var expression `{env.\{vars.example}} == 'francis'`

# Works with existing Caddy:
@var `path("/" + {env.USER})`
# How to escape outside of vars matcher?
@var `path("/" + {env.{vars.example}})`
# Not successful: `curl http://localhost/francis`
@var `path("/" + "{env." + {vars.example} + "}")`

When I was experimenting on this functionality, I found some Caddy CEL matcher methods like method() and path() that would take an explicit string or sequence, but I could not provide a delimited list string nor use CEL to split into an array. I worked around it with plain CEL logic:

# `method({env.METHODS})` workaround:
# This was one way to have restrictions configured via ENV `METHODS=POST,GET`
@is-method-allowed expression `{method} in {env.METHODS}.split(',')`

# `path({env.ALLOW_FRUIT)` workaround:
# `ALLOW_FRUIT='apple,banana,cherry' 
@ allow-base-path expression `{env.ALLOW_FRUIT}.split(',').exists(base, base == {path.0})`

# Likewise matching you cannot match path slices: `*/{path[1:]}`

I gotta say tho ⚠ you gotta be extra careful if you use a pattern like this, make sure to not allow any user-controlled input in your vars cause it could cause bad things to happen (e.g. invoking {file.*} which could read system files that shouldn't be read).

Fair, my interest was using ENV for users to configure access control to some API endpoints. I wanted to compose the ENV keys as there was many permutations and I also wanted to use {path.0}.upperAscii() to get a portion of that.

For my use-case I think that would be safe as it's configured by the sysadmin deploying the service, with the only third-party input from the incoming URL, but perhaps that could be used as an exploit πŸ€” (_I only need a-z, which I can retrieve via path_regexp if needed_)


More realistic example

Here is the /secrets endpoints from the Docker Engine OpenAPI config:

GROUP ENDPOINT METHOD ID
Secret /secrets GET SecretList
Secret /secrets/* DELETE SecretDelete
Secret /secrets/* GET SecretInspect
Secret /secrets/*/update POST SecretUpdate
Secret /secrets/create POST SecretCreate

The approach with ENV would have looked something like this:

I had considered using path_regexp with the ENV being an a regex value:

ALLOW='containers(/[^/]+)?(/(start|stop|restart|kill|json))?'
@allow `path_regexp('^/(' + {env.ALLOW} + ')$')`

But that did not seem very user-friendly to manage πŸ€”


Caddy use-case

I wanted to see how much better Caddy could replace docker-socket-proxy (_HAProxy based, there's also an nginx variant_).

I think the logic is simple enough once it can composite the ALLOW_ / DENY_ prefix + _<Method> suffix with dynamic ENV key from the URI ({path.0}). Importing a snippet could be used to support additional sockets with import args for the socket name and related ENV prefix.

The multiple sockets feature is the only part that seems like it'd require users to add Caddyfile config, but that could be fairly minimal file that import brings in. This is only because while I know how to match by unix socket the request came from, I could not bind to more than one socket dynamically (same issue with CEL functions that take multi-values shown above with method() / path()). As I intend for the Docker image to exclude a shell, that would not be able to generate such config via ENTRYPOINT, I could publish a variant image that does πŸ€·β€β™‚

Otherwise something like this would be neat if it'd work:

# NOTE: Traffic to the site-block must be HTTP, port is not relevant when binding only unix sockets
# ENV `SOCKETS=hello,world` => `['unix//tmp/hello.sock', 'unix//tmp/world.sock']`
http:// {
  # `env` placeholder does not support fallback syntax, detect undefined/empty manually:
  @not-empty `{env.SOCKETS} != ""`
  # CEL expressions outside of matchers is not supported:
  var sockets `{env.SOCKETS}.split(',').map(name, 'unix//tmp/' + name + '.sock', name.size() > 0)`
  # Default fallback (if no valid values from SOCKETS):
  var sockets unix//tmp/example.sock

  # Caddy presently doesn't support multiple values from a placeholder, even if that were a plain string:
  bind {var.sockets}
}
# Example of querying the caddy binded socket:
curl --unix-socket /tmp/example.sock http://localhost/secrets

I'll provide a follow-up tomorrow with what I had and how the feature request could improve it, although I think I may have run into a bug with my current approach.

francislavoie commented 1 month ago

Remember that normally in CEL expressions, placeholders get transformed info a function call, using a regexp. So {path} becomes caddyPlaceholder(request, "http.request.uri.path"). That means you have to use it like it is itself an expression which returns a string. You can concatenate it with something else etc.

@var `vars({"{env.USER_" + {path.0}.upperAscii() + "}": "francis"})`

This doesn't work because .upperAscii() is a dynamic function which doesn't return a constant. This does work though, as long as you don't need to do a transformation:

@var `vars({"{env." + {path.0} + "}": "francis"})`
@var expression `{env.\{vars.example}} == 'francis'`

This doesn't make sense because this would effectively do caddyPlaceholder(request, "env.{vars.example}") which doesn't do anything useful.

@var `path("/" + {env.{vars.example}})`

Same reason

@var `path("/" + "{env." + {vars.example} + "}")`

This can't work because placeholder evaluation doesn't happen if you concatenate it. That's why the vars matcher is needed, because it can do placeholder lookups.

When I was experimenting on this functionality, I found some Caddy CEL matcher methods like method() and path() that would take an explicit string or sequence, but I could not provide a delimited list string nor use CEL to split into an array.

That's because they're transformed into matcher modules at CEL compile time, so you can't have any dynamic arguments to the matchers, they must be constants that are evaluated when the CEL is compiled.

More realistic example; Here is the /secrets endpoints from the Docker Engine OpenAPI config

πŸ˜‚ okay at that point you should just write a Caddy plugin! That's honestly way too much logic to try and stuff into Caddy config. Square shape, round hole.

bind {var.sockets}

That would never work because bind is not an HTTP handler directive, you can't use anything like matchers or HTTP handlers (including vars which is an HTTP handler). It's a config directive which modifies the apps.http.servers.N.listen config property (but with site block abstraction magic in the Caddyfile layer). You can forget about that idea altogether.

polarathene commented 1 month ago

Feel free to skip this in favor of the follow-up comment that better illustrates what I was trying to accomplish and the use-case for env() as a function in CEL.


This doesn't work because .upperAscii() is a dynamic function which doesn't return a constant.

I only know that it works in CEL elsewhere. But Caddy support with CEL is only for matchers with boolean return values. If I could have set it via vars directive then it would appear as a constant input at runtime via the vars placeholder right?

At least from what I've understood, Caddy offers it's own CEL functions, and these are taking in a static/constant input, but the input itself can be dynamic at runtime. There is just some limitation that prevents computing the input with CEL as input (in JS and Rust, this would be a closure as the function arg).

UPDATE: Having implemented the CEL functionality via the Rust ecosystem and replacing the placeholders before compiling the CEL program I have a better understanding of this.

Technically it should be possible if parsing the placeholder similar to how many languages handle field vs key access in objects obj.field == obj["field"], thus {env[ some-cel-here ]} == caddyPlaceholder(env, some-cel-here) (I can't quite recall the exact transformation, but I recall caddyPlaceholder() πŸ˜… )

In my case though, just a CEL function like env() that takes in a string arg would work.


This does work though, as long as you don't need to do a transformation

URIs are conventionally lowercase, while environment variables are conventionally uppercase. I need that transformation.

CEL seems like it could be used with regular vars directive instead of only boolean based matching logic, but AFAIK that's not supported with Caddy (not sure if intentional or just no valid motive to justify support).


This can't work because placeholder evaluation doesn't happen if you concatenate it. That's why the vars matcher is needed, because it can do placeholder lookups.

From the example you provided, I am not sure if I understand how to use it properly.

It looked like you gave vars() function an object/map with the env + vars placeholders used to derive the value of a key, that then had francis assigned as the value? Is this then internally doing a key == value comparison?

I don't see how the vars matcher function in CEL can help with the use-case of this issue πŸ€” (at least regarding deriving an ENV key dynamically, as per the title)


That's because they're transformed into matcher modules at CEL compile time, so you can't have any dynamic arguments to the matchers, they must be constants that are evaluated when the CEL is compiled.

I guess I am probably expecting to much from CEL here (and the Caddyfile DSL), which became a bit more evident once fleshing out the idea further :(

UPDATE: I did manage to make it work (minus some support in Caddy), but it's definitely awkward and verbose to express in CEL.

πŸ˜‚ okay at that point you should just write a Caddy plugin! That's honestly way too much logic to try and stuff into Caddy config. Square shape, round hole.

In my head it was meant to be a lot more simpler/elegant than the linked haproxy / nginx configs, but once I started trying to support the extra flexibility that users were interested in (which neither of those offered), that became more difficult to express πŸ˜“

I am not a Go developer, and the brief examples of Caddy plugins I saw either had a lot of boilerplate noise or felt like I would need to learn a fair bit vs write the equivalent in a language I know well. I might consider tackling a plugin again.

UPDATE: Technically I could use my CEL solution as-is if a plugin can also implement CEL functions? Is that more complicated? It was very easy for me with Rust to add new CEL functions.

That would never work because bind is not an HTTP handler directive, you can't use anything like matchers or HTTP handlers (including vars which is an HTTP handler).

Oh ok, probably something that would be better suited to JSON based config?

At this point it seems like Caddyfile + CEL has been more of an XY problem where I was probably better off generating the JSON config since the user was only intended to configure via ENV. I just wanted to avoid requiring a separate process, but that's likely only going to be viable if I write a Caddy plugin to handle this πŸ€·β€β™‚

UPDATE: While I can't derive multiple binds from an env placeholder, it does seem to work with the non-placeholder ENV support during config adaption, so long as the binds are space delimited πŸŽ‰ (I had assumed this would have treated the ENV as a single input, like the file placeholder did)


Since this comment is a bit outdated and long, I'll post a follow-up with my finalized Caddyfile attempts.

polarathene commented 1 month ago

EDIT: Oh wow, Github recognizes Caddyfile syntax now, that's great! 😁

TL;DR:


Perhaps this would be a good candidate for the Caddy community forum wiki on leveraging CEL 😝

# Example instance:
$ docker run --rm -itd --name caddy \
  -v ./Caddyfile:/etc/caddy/Caddyfile \
  -v /tmp/sockets:/tmp/sockets \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --env ALLOW='info, version, _ping' \
  --env ALLOW_CONTAINERS='json' \
  --env ALLOW_IMAGES_GET='json' \
  --env DENY='containers' \
  --env CUSTOM_DENY='version' \
  --env CUSTOM_ALLOW_GET='networks' \
  --env CADDY_SOCKET_BINDS='unix//tmp/sockets/docker.sock unix//tmp/sockets/custom.sock' \
  caddy

# From Docker host or any container that has access to the socket files Caddy created via bind:
# Permitted:
$ curl -sSf --unix-socket /tmp/sockets/docker.sock http://localhost/version | jq -r .Version
27.2.1

# Denied:
$ curl -sSf --unix-socket /tmp/sockets/custom.sock http://localhost/version
curl: (22) The requested URL returned error: 403

# Or via the `docker` CLI by setting the socket to the `DOCKER_HOST` env:
$ DOCKER_HOST=unix:///tmp/sockets/docker.sock docker container ls
CONTAINER ID   IMAGE     COMMAND                  CREATED         STATUS         PORTS                                NAMES
20b378e35826   caddy     "caddy run --config …"   2 seconds ago   Up 2 seconds   80/tcp, 443/tcp, 2019/tcp, 443/udp   caddy

$ DOCKER_HOST=unix:///tmp/sockets/docker.sock docker network ls
Error response from daemon: Forbidden

I'm not sure how this compares to writing/maintaining a Caddy plugin or HAProxy/Nginx equivalent in Lua.

Presumably Lua can express the same logic as easily (it at least appears to be capable of the dynamic ENV queries), so it'd be neat if Caddy were capable of such without requiring users to get a custom Caddy build :)

Working (without multi-socket support)

Original attempt, this is rather verbose and hard-codes each base path and method variants. Not as easy to grok and requires various workarounds adding complexity.

At the time I did not see an easy way to also add support for overrides by socket used. Perhaps I've been looking at it too long, but it'd likely make it more complicated to tack on.

{
  default_bind unix//tmp/sockets/docker.sock
  auto_https off
  admin off
}

(matcher) {
  @{args[0]} expression <<CEL
    {env.{args[1]}}.{args[2]}
    || ( {method} == 'GET'    && {env.{args[1]}_GET}.{args[2]}    )
    || ( {method} == 'HEAD'   && {env.{args[1]}_HEAD}.{args[2]}   )
    || ( {method} == 'DELETE' && {env.{args[1]}_DELETE}.{args[2]} )
    || ( {method} == 'POST'   && {env.{args[1]}_POST}.{args[2]}   )
    || ( {method} == 'PUT'    && {env.{args[1]}_PUT}.{args[2]}    )
  CEL
}

# These snippets are to DRY the CEL logic to minimize noise above:
# {path.0}.replace('_', '')
(matcher-base) {
  import matcher {args[0]}-base {args[1]} `split(',').filter(v, size(v.trim()) > 0).exists(base, base == {path.0})`
}

# Existence of the suffix ENV implies it's associated base path was used.
# NOTE: The final `{path.0}` condition provides a workaround akin to ensuring
# `/secrets` => `ALLOW_SECRETS` (_ENV name suffix matches associated base path_)
# Required workaround as suffix ENV is not dependent upon base ENV permission (eg: `ALLOW=secrets`)
# and the `env` placeholder does not support input via CEL: `{path.0}.upperAscii()`.
(matcher-suffix) {
  import matcher {args[0]}-suffix {args[1]} <<CEL
    split(',')
    .filter(v, size(v.trim()) > 0)
    .exists(suffix, {path}.endsWith('/' + suffix))
    && {path.0} == '{args[2]}'
  CEL
}

# Matcher logic for permitted base path:
(matched-base) {
  import matcher-base allow ALLOW
  import matcher-base deny DENY

  # Use a `route` to ensure predictable logic order (avoids bugs from dependent matchers/vars):
  route {
    # Initialize default state:
    vars base-allowed false
    vars base-denied false
    vars permit-base false

    vars @allow-base base-allowed true
    vars @deny-base base-denied true

    # NOTE: Without the explicit boolean default set above for the `base-allowed` var,
    # this expression will result in `curl: (52) Empty reply from server`
    # But on Caddy v2.8.4 (current stable release) it instead returns a 500 error status:
    # `curl: (22) The requested URL returned error: 500`
    @matched-base `{vars.base-allowed} && !{vars.base-denied}`
    vars @matched-base permit-base true
  }
}

# Matcher logic granular subpath permissions:
# `import` produces a CEL expression matcher (including the HTTP method ENV variants) like this:
#@allow-secrets-suffix `{env.ALLOW_SECRETS}.split(',').exists(suffix, {path}.endsWith('/' + suffix))  && {path.0} == 'secrets'`
(matched-suffix) {
  import matcher-suffix allow-{args[0]} ALLOW_{args[1]} {args[0]}
  import matcher-suffix deny-{args[0]} DENY_{args[1]} {args[0]}

  # Update common suffix state vars:
  # Not using a route block here would match `{path.0}` before `uri strip_prefix`, failing to match versioned requests:
  route {
    vars @allow-{args[0]}-suffix suffix-allowed true
    vars @deny-{args[0]}-suffix suffix-denied true
  }
}

(match-logic) {
  import matched-base

  vars suffix-allowed false
  vars suffix-denied false

  # A list I'd prefer to avoid maintaining:
  import matched-suffix auth AUTH
  import matched-suffix build BUILD
  import matched-suffix commit COMMIT
  import matched-suffix configs CONFIGS
  import matched-suffix containers CONTAINERS
  import matched-suffix distribution DISTRIBUTION
  import matched-suffix events EVENTS
  import matched-suffix exec EXEC
  import matched-suffix images IMAGES
  import matched-suffix info INFO
  import matched-suffix networks NETWORKS
  import matched-suffix nodes NODES
  import matched-suffix _ping PING
  import matched-suffix plugins PLUGINS
  import matched-suffix secrets SECRETS
  import matched-suffix services SERVICES
  import matched-suffix session SESSION
  import matched-suffix swarm SWARM
  import matched-suffix system SYSTEM
  import matched-suffix tasks TASKS
  import matched-suffix version VERSION
  import matched-suffix volumes VOLUMES

  # Base or Suffix can be allowed, but suffix has higher precedence over base:
  @permit-endpoint <<CEL
    ({vars.permit-base} || {vars.suffix-allowed})
    && !{vars.suffix-denied}
  CEL
}

http:// {
  # Helpful for troubleshooting failing requests to identify which endpoints are required:
  log

  # This must modify the path before the `match-logic` imports, thus cannot be in the final `route` block.
  @version path_regexp ^/(v[\d\.]+)/
  uri @version strip_prefix {re.version.1}

  # Cannot use this in the `route` scope as it will cause a config error that prevents Caddy starting:
  # (Something about the named matchers not being in scope for `vars` directives with the chained `imports`?)
  import match-logic

  # Use a `route` to support the fallback `respond`. Due to earlier `route` usage 
  # `handle` was not used as it is ordered before `route` (_needed for workaround, or adjusting global directive order_)
  route {
    reverse_proxy @permit-endpoint unix//var/run/docker.sock {
      rewrite {http.request.orig_uri}
    }
    respond "Forbidden" 403
  }
}

NOTE: The comments about curl: (52) Empty reply from server vs 500 status returned are related to matcher error encountered with CEL at runtime. The log lines emitted are reduced with the empty server response but cover the same information so I assume the empty reply instead of 500 status was intentional.


Almost working (with multi-socket support)

This leverages a proposed env() CEL function to get that functionality I was interested in, allowing to express all the rules in a single match CEL expression (which is still not the most pleasant to grok, so I've added commentary):

# Missing support for:
# - The list `.slice()` CEL extension.
# - `env()` CEL function not implemented in Caddy.

# Breakdown of each CEL section:
# Initializes with two inputs variables from Caddy:
# - Just the socket filename without extension for ENV prefix lookups
# - Normalizing the base path (required for `_ping` => `ping`) for ENV suffix lookups
#
# A 2D array is generated to compose ENV prefix + suffix with the allow/deny rules.
# Order is important - Precedence is: `socket-suffix > suffix > socket-base > base`
# Next the `env_parts` are converted to strings for ENV lookup to resolve as bools:
# 1. A variant with the current request method is added.
# 2. The original `env_name` and variant perform an ENV lookup each and preprocess any returned value into a list.
# 3. The `env_name` is used to apply either the suffix or base check on each list item.
# 4. This results in a final 2D array of type: `[[bool;4],[bool;4]]`.
#
# The final step (reduce `[allowd[], denied[]]` to a single bool result):
# 1. Maps the two array items into an object for better readability.
# 2. An array of indices is iterated over due to CEL lacking a better approach,
#    this will short-circuit upon the first `true` returned.
# 3. Starting at the highest precedence (sockets-suffix), find an allowed entry
#    that does not have a denied entry of the same precedence or higher.
(docker-api-proxy-matcher) {
  @permit-endpoint <<CEL
    [{
      "socket": [{http.request.local}].map(s,
         s.substring(s.lastIndexOf('/') + 1, s.lastIndexOf('.sock'))
      )[0],
      "suffix": {path.0}.replace('_', '')
    }]

    .exists(inputs,
      [['ALLOW', 'DENY'].map(rule,
        [
          [inputs.socket, rule, inputs.suffix],
          [rule, inputs.suffix],
          [inputs.socket, rule],
          [rule],
        ]
        .map(env_parts, env_parts.join('_'))
        .map(env_name, [env_name, env_name + '_' + {method}].exists(key,
          env(key.upperAscii())
            .split(',')
            .filter(v, size(v.trim()) > 0)
            .exists(value,
              env_name.endsWith(inputs.suffix)
                ? {path}.endsWith('/' + value)
                : value == {path.0}
            )
        ))
      )]

      .map(arr, { "allowed": arr[0], "denied": arr[1] }).exists(results,
        [0, 1, 2, 3].exists(i,
          results.allowed[i] && !(true in results.denied.slice(0, i + 1))
        )
      )
    )
  CEL
}

# As a snippet this allows providing multiple sockets to listen on via import args.
# - The first site-block to bind a socket will be prioritized without site-address specificity.
# - Ports are not relevant in a site-address when traffic arrives from socket.
http:// {
  # Space delimited args in this ENV will bind multiple sockets:
  bind {$CADDY_SOCKET_BINDS:unix//tmp/sockets/docker.sock}

  # If the version prefix exists, strip it before matching `@permit-endpoint`:
  @version path_regexp ^/(v[\d\.]+)/
  uri @version strip_prefix {re.version.1}

  import docker-api-proxy-matcher

  handle @permit-endpoint {
    # Due to `uri strip_prefix`, restore the original request URI when forwarding:
    reverse_proxy unix//var/run/docker.sock {
      rewrite {http.request.orig_uri}
    }
  }

  # Permission was denied:
  handle {
    respond "Forbidden" 403
  }
}

As you can see I've made heavy use of the extended CEL functions, which Caddy seems to have support for most, but not slice() for some reason.

That said, at least with Rust there's useful collection and iteration methods that would simplify expressing that final step better, but beyond the env() function request it's pure standard CEL (with official extensions).

francislavoie commented 1 month ago

Oh actually, this works with no changes:

:8881 {
    vars example USER
    @var `caddyPlaceholder(request, "env." + {vars.example}) == "francis"`
    respond @var "hey francis"
    respond "nope"
}

Basically this directly uses the function that we have defined for placeholder shortcuts. Not sure why I didn't think of that earlier. We should probably document that. It gets compiled to:

caddyPlaceholder(request, "env." + caddyPlaceholder(request, "vars.example")) == "francis"

I do want to change it to be more ergonomic though. I don't like how long it is to write out caddyPlaceholder(request. @mholt how do you feel if we change it to ph(req instead? This would make it more ergonomic to use. Is there a better function name for caddyPlaceholder that's both short and self explanatory?

francislavoie commented 1 month ago

which Caddy seems to have support for most, but not slice() for some reason.

Ah, I see it. We need to enable explicit support for these extensions. We currently only have the strings extension added:

https://github.com/caddyserver/caddy/blob/4b1a9b6cc1aa521e21289afa276d29952a97d8f3/modules/caddyhttp/celmatcher.go#L169

I can just add ext.Lists() here for slice() to work (but weirdly, .sort() doesn't work with that extension πŸ€” edit: oh .sort() was literally just added last week and isn't released yet πŸ˜‚ okay nvm)

@TristonianJones how much of a performance hit would it be to just enable all the extensions? Is it pretty cheap or does it have more cost?

polarathene commented 1 month ago

Basically this directly uses the function that we have defined for placeholder shortcuts. Not sure why I didn't think of that earlier. We should probably document that.

Yeah that'd be awesome thanks. It didn't occur to me until after I wrapped up the 2nd Caddyfile example πŸ˜…

I'd still need the extension function .slice() for lists to be implemented, then it'd work with the bulky single matcher.

We currently only have the strings extension added

Oh that makes sense :)

I initially ran into cel.bind() not working, hence my workarounds for handling the socket input field within an array.


I do want to change it to be more ergonomic though.

I prefer the longer caddyPlaceholder name, ph is a bit too vague and probably not fun to lookup as a search term? I agree it does look awkward to nest/chain though.

How viable is the parser suggestion for {env[{ vars.example} ]} / {env[ ('allow' + '_' + {method}).upperAscii() ]}?


Is there a better function name for caddyPlaceholder that's both short and self explanatory?

In Rust fn is used for function declarations, it's a nice shorthand, while it doesn't imply placeholder functionality specifically, caddyFn would seem ok for a short name that calls into Caddy functionality.

You could also technically replace caddyFn( with caddyFn(request, as a postprocess step before compilation, if there's no reason a user would benefit from explicitly providing the initial arg?

francislavoie commented 1 month ago

How viable is the parser suggestion for {env[{ vars.example} ]} / {env[ ('allow' + '_' + {method}).upperAscii() ]}?

I'm not interested in entertaining that tbh. CEL parser extensions are a pain in the ass to get right. Our existing code for CEL stuff is already super arcane. I wish it was easier to work with, but I understand why it was designed this way. It's deep in the weeds of language compilation.

In Rust fn is used for function declarations, it's a nice shorthand, while it doesn't imply placeholder functionality specifically, caddyFn would seem ok for a short name that calls into Caddy functionality.

It's not a function at all so I don't agree with that.

What I want to get rid of the most is the caddy prefix for it, moreso than shortening placeholder. It feels poorly integrated if we have caddy written throughout the CEL expression, it doesn't feel like it lives in Caddy if we have to tell CEL "hey we're Caddy btw", it's awkward. I might just go with placeholder for now in my draft PR.

TristonianJones commented 1 month ago

@TristonianJones how much of a performance hit would it be to just enable all the extensions? Is it pretty cheap or does it have more cost?

@francislavoie It's pretty cheap to enable all of the extensions, at least at compile time. The main concern is your overall execution cost; however, you're probably fine if you can tolerate macros.

@polarathene very impressive set of CEL expressions to try and address this use case. Composition is a big challenge for large parts of the CEL community which is why we've started to introduce a canonical policy format: https://github.com/google/cel-go/pull/1025. Honestly, you just need to use the common internal representation and you can provide a set of expressions which CEL policy compiler will compose into a single CEL expression. No change to the runtime setup needed!

I feel as though I may have missed some context here, but if you'd ever like to provide feedback about CEL and want to schedule a meeting, I'd be happy to talk with you all.

TristonianJones commented 1 month ago

@francislavoie I'll cut a new release once two-variable comprehensions are finished. Likely within a week or two.

polarathene commented 1 month ago

I'm not interested in entertaining that tbh. CEL parser extensions are a pain in the ass to get right. Our existing code for CEL stuff is already super arcane.

Oh, fair enough. I thought you were using a regex as a preprocess step to replace the caddy placeholder syntax before compiling the CEL input.


It's not a function at all so I don't agree with that.

Oh ok, my bad. Looked like a CEL function to me from how you were demonstrating it πŸ€”

In Rust I was able to easily create an env() function to lookup ENV by just doing this:

context.add_function("env", |This(value): This<Arc<String>>| {
  std::env::var(value.as_str()).unwrap_or_default()
});

I'm not sure what the Go (or caddy integration) equivalent to implement such is like, so if that's not as simple, no worries πŸ‘


It feels poorly integrated if we have caddy written throughout the CEL expression, it doesn't feel like it lives in Caddy if we have to tell CEL "hey we're Caddy btw", it's awkward. I might just go with placeholder for now in my draft PR.

I assume most of the time this would be not visible to users (at least currently given it's not documented?). I don't see much issue with the prefix as a namespace, but don't mind either way.

I already had to learn that there are two documents for referencing CEL syntax and extensions, so once you add your own additions those need to be easy to discover or distinguish πŸ˜…

In the past I've had a similar user experience but a bit more frustrating when diving into using HCL for configs, where projects using HCL didn't provide much helpful reference material on what you could actually do πŸ˜•


I suppose a short generic method name would be get() => get('env.' + get('vars.example'))?

francislavoie commented 1 month ago

I thought you were using a regex as a preprocess step to replace the caddy placeholder syntax before compiling the CEL input.

Yes we are, but what you're suggesting sounds a lot more complicated than that.

Oh ok, my bad. Looked like a CEL function to me from how you were demonstrating it πŸ€”

What I'm saying is placeholders aren't functions, they're value replacement. So calling the function that fetches the value of a placeholder "fn" doesn't make sense.

In Rust I was able to easily create an env() function to lookup ENV by just doing this:

I get that, but there's no need for that when placeholders already support env. It's redundant when placeholders can be used to do the same thing already.

I assume most of the time this would be not visible to users (at least currently given it's not documented?).

I'm talking about making it visible, documented. Which is why I want a "presentable" name for it.

I suppose a short generic method name would be get() => get('env.' + get('vars.example'))?

Maybe. But it would have to be get(req, "http.request.method") (the req is required to get access to the request context to grab the replacer with the ability to grab parts of the request). I worry that's too generic a name. Maybe it works though. Will think on it.

polarathene commented 1 month ago

Yes we are, but what you're suggesting sounds a lot more complicated than that.

A little bit, mostly just {placeholder.optional} vs {placeholder["optional"]}.

Either the placeholder has . or [ delimiter for the right-hand value. If it's [ then anything between that and ] is CEL that could be input as an arg to a function, whereas with . the RHS is a plain string value.

With a direct function call to lookup the placeholder value, that suggestion loses relevance though, so nevermind it πŸ‘


I worry that's too generic a name.

Kinda hard to avoid if you want a short name 😝

A synonym that is less likely to clash or be misinterpreted may be obtain? Or take / pick (both more common in programming, closer to generic), or grab (bit informal).

You could also suffix it like get_ph(), get_token(), etc.

ph has numerous meanings, none that would make sense in the context of CEL usage but still feels weird. placeholder is fine

francislavoie commented 1 month ago

I'll close this now, https://github.com/caddyserver/caddy/pull/6594 is merged which I think addresses everything required here.

I decided to go with ph(req, '<placeholder>') as the final syntax for fetching a placeholder (including env) in CEL.