Feature req: using regex captures in target expression

candlerb commented 8 years ago

It would be useful to be able to use of regex capture groups in targets. For example:

table https {
    ^(.+)\.example\.com$   $1.backend.net:443
}

That is, I would like to select the target host based on a portion of the SNI hostname only, and without listing the hosts one by one.

dlundquist commented 8 years ago

It looks like this could be done by replacing pcre_exec() with pcre2_substitute()[1]. I'm not sure how much value there is in this, since when proxying TLS the backend server will still need to be configured with certificate with the frontend hostname, otherwise the client will not accept the certificate. A similar solution could be achived using the resolver search domain, except the backend would connect to \1.example.com.backend.net. Finally, I'm still concerned with the performance impact of the linear search of table backend with large tables. Hence, I've been considering moving away from regular expression matching to strictly prefix wildcard matching (the same as TLS certificates). I'm not saying no to this feature, just need to be convinced of the usefulness given the potential performance impact.

http://www.pcre.org/current/doc/html/pcre2api.html#SEC28

candlerb commented 8 years ago

Yes, I understand about the certificate mismatch issue and that it would require people to accept bad certificates.

This is intended as an access frig for a lab setup. There are a bunch of containers sitting inside a single VM with a single outside public IP address. You'd put a real wildcard DNS entry in the public DNS, *.realdomain.com, pointing at the outside public IP; then forward *.realdomain.com to *.internal.local following internal DNS with private IP addresses. There might be multiple instances of the VM running on different IPs, so a different public DNS name would point to each instance.

Thinking about this a bit more, maybe a better solution is to get all the clients to use a PAC file to forward *.internal.local via a SOCKS gateway running on the outside IP of the VM. It's a bit more client configuration work, but it does mean the certificate names will match, and it can work for all protocols, not just TLS-based ones.

If you are planning to change the tables to do domain suffix matching I'm fine with that. I guess you mean for foo.bar.baz you'd first do a lookup in the table for foo.bar.baz, then *.bar.baz, then *.baz, then *. If this is an in-RAM or CDB file lookup, that would be very fast.

In fact, for my original requirement, a single match on * would be fine anyway. It's just I wanted to split the domain name into head and tail parts, and the regex match seemed to be a convenient way to achieve that.

Happy to drop this request :-)

stampycode commented 6 years ago

@dlundquist I would really like this request to be re-opened... I am running a container cluster in a similar way to @candlerb - but the containers in my cluster will serve correct certificates.. Example:

table https {
    ^([^\.]+)   $1:443
}

So

server1.somerandom.example.com => server1
server2.other.random.example.com => server2

Each container is configured with a certificate for the public-facing domain name, so no mismatches. Each server may also have alternative SANs in the certificates, meaning they can respond legitimately to variances in their domain name also.

TBH I'd be happy if you could do this on subdomain matching alone, without the need for the overhead of regex - I think this could be a powerful feature for this kind of containerised setup, which is a growing design pattern.

dlundquist commented 6 years ago

I'm not opposed to this feature. I think the present use of regular expressions can be confusing to some users who assume it is just wildcard globbing. I'm also considering the complexity and backwards compatibility of configuration syntax. This could be a backwards incompatible v1.x change, but if so I would like to move the configuration parser to ragel or bison. Maybe the next step is to mock up several configuration syntax options.

Another option would be to introduce this now, and declare it an experimental (i.e. syntax may change) feature, but once features have users breaking changes always upset someone.

stampycode commented 6 years ago

Well globbing would work fine for me, and it would still provide more flexibility than currently exists by both haproxy and nginx. The problem is existing solutions don't provide a simple dynamic destination endpoint, they all depend on hardcoding the destination host names/IPs.

Something like:

table https {
    (*).something.random.example.com        (*).internal:443
    (*.internal).random.example.com        (*):443
}

Would still solve my issue perfectly without regex. Regex is probably overkill, I'd assume 99% of people who'd want this feature would just be using the ends-with anchor.

Alternative syntax:

table https {
    %.something.random.example.com        %.internal:443
    %.internal.random.example.com        %:443
}

So:

foo.something.random.example.com => foo.internal:443
foo.internal.random.example.com => foo:443
foo.bar.something.random.example.com => foo.bar.internal:443
foo.internal.something.random.example.com => foo.internal.internal:443

I think if you avoid regex matching it would be less likely to be BC breaking in the future.

dlundquist / sniproxy

Feature req: using regex captures in target expression #208