akyriako / cert-manager-webhook-opentelekomcloud

ACME DNS01 solver webhook for Open Telekom Cloud DNS
Apache License 2.0
2 stars 1 forks source link

Fuzzy search causes delegated DNS zones to block higher level zones #1

Closed canaykin closed 4 months ago

canaykin commented 5 months ago

Greetings!

Firstly, thank you for the great webhook for the cert manager, having wildcard capability with OTC DNS is great!

Background

To give a background on the topic, we generally deploy multiple independent staging platforms for our customers and developers. As such, we generally use a delegated DNS zone where for example zone dev.domain.com belongs to development stage of the system whereas qa.domain.com belongs to the QA stage. The exception to this pattern is the production stage, where the parent domain zone domain.com is used.

As mentioned, the wildcard certificates were the primary reason we wanted to use ACME dns01. And for testing and general use case I used letsencrypt as the issuer with DNS SANs of domain.com and *.domain.com

Problem Description

When testing the webhook, I noticed a specific scenario where the webhook can fail with the following message:

error presenting challenge: present failed: present failed: found 2 while expecting 1 for zone

Upon inspection I noticed that this happens when there is a sub DNS zone below a privileged zone and the certificate is being created for the privileged zone. With some digging, I found that the OTC API for Querying Public Zones behind the gophertelekomcloud is by design making a fuzzy search on the name parameter in the form of .*domain\.com.

As a result, when the zone for domain.com is search, the result has domain.com but also dev.domain.com and all other delegated subzones, breaking the webhook for higher privilege DNS zones.

Fix Proposal

Since the cert-manager by default, chooses the least privileged zone based on a recursive name lookup, I patched this by checking for the shortest zone name in the list of zones returned by the API in solver_utils.go:

    if len(allZones) < 1 {
        return nil, fmt.Errorf("%s failed: found %v while expecting 1 for zone %s", action, len(allZones), ch.ResolvedZone)
    }

    minLen := 256
    r := 0

    for idx, zone := range allZones {
        if len(zone.Name) < minLen {
            minLen = len(zone.Name)
            r = idx
        }
    }
    return &allZones[r], nil

The idea was that if cert-manager is already providing us the zone name with least privilege, we can use name length to find the highest privileged zone and eliminate lesser zones which wouldn't work anyway. This solution worked for our use case and we were able to get wildcard certs signed via ACME dns01 for all of the zones in the project.

Caveats

It is possible to specify more complicated DNS delegation structure using CNAME records and cnameStrategy: Follow according to cert-manager docs. I have not tested this use scenario and simply checking for highest privilege zone may not work for all use cases.

Thank you very much in advance, Can.

akyriako commented 5 months ago

Hi Can,

Thanks for taking an interest on the webhook and mostly finding out the corner-case that had to be improved. Your solution seems to be solid and I will incorporate it in mine as well.

Regarding the last part you mention:

more complicated DNS delegation structure using CNAME records

I will definitely have a look at it, I've already opened a dedicated issue that you could track https://github.com/akyriako/cert-manager-webhook-opentelekomcloud/issues/2

Cheers!