Venafi / terraform-provider-venafi

HashiCorp Terraform provider that uses Venafi to streamline machine identity (certificate and key) acquisition.
https://www.terraform.io/docs/providers/venafi/
Mozilla Public License 2.0
16 stars 20 forks source link

Deal with built-in Venafi approval steps #100

Open jkacou opened 1 year ago

jkacou commented 1 year ago

BUSINESS PROBLEM The issue is about the creation of a certificate that is requiring an approval before actually create the certificate. For now, terraform finish in timeout while waiting for the approval. below the process of failing

  1. Terraform request a new certificate
  2. The creation is stopped by the approval step
  3. Since the approval is not validated within 3minutes terraform timeout
  4. No entry will be save in the state file (so terraform will never know the issue)
  5. On the next apply (let's say we have a jenkins job behind) another certificate (enven tough the previous one has been validated)
  6. The request fail (Assuming there is a template policy to prevent duplication)
  7. Forced to manually delete the failed latest creation request and import the resource created during the validation step

PROPOSED SOLUTION As a solution, it should require an http response from Venafi indicating there is a validation step ongoing (I don't know if it is already the case) Then the provider could handle this specific case and update the state file according it (as a temporary state which will be updated on the next apply)

CURRENT ALTERNATIVES using python script to check the delta on state file vs created certificates (with the required filters) and import the delta

VENAFI EXPERIENCE Using Venafi for 1 year 10% of my time today as I am working on our module to leverage its features

jkacou commented 10 months ago

Hello, A year now I openned this issue Is there anything planned for it ?

jswartzy commented 3 months ago

Ditto for our team. Is this going to get any love?

BeardedPrincess commented 3 months ago

There are some fundamental technical (and security-related issues) that arise from introducing human approvals into an automated flow like Terraform (also applies to vCert playbooks). The primary issue is what is the expected behavior of applying a terraform plan that could take days, or even weeks to complete by waiting for manual approvals? I don't believe that it's possible to wait indefinitely, but if it were, is that desirable? In general, you'd expect a TF plan to apply consistency on every run/apply. However, to make this work, we'd need to somehow keep state of what has already been requested, and if it's done, and expect the plan to be continuously "tried" until the certificate was approved? Can anyone provide other samples of providers that enable human interventions that can cause potentially indefinite wait periods like this? Would be curious to see what the best-practice is for handling that.

My recommendation instead: you should carefully evaluate what criteria the "human approver" is applying when making a decision about whether to approve or deny a request, and look to implement that with the policy enforcement available in the Venafi platform, or using an adaptable workflow to address more complex enforcements.

One way to start this evaluation is to look at what requests were rejected by humans over the last 90-180 days. If the humans are not rejecting any, that is a good clue that they may be "rubber-stamping" all requests. If they have rejected, look at the reasons for rejection, and determine if those things are already being enforced by the Venafi Platform anyway, or if they can be implemented in an adaptable workflow.

abrahamoshel commented 3 months ago

At least for our team I think it is really a second pair of eyes to make sure there is not a spelling error in cert or domain name. Since we are looking to provision fairly expensive certs.

BeardedPrincess commented 3 months ago

At least for our team I think it is really a second pair of eyes to make sure there is not a spelling error in cert or domain name. Since we are looking to provision fairly expensive certs.

That's understandable, but a mistake in spelling the hostname incorrect in a terraform plan (which I expect is doing much more than just creating a certificate) would have other significant impacts. Wrong DNS registered, incorrect host / SNI settings on Load Balancers / webservers, etc. Is it common for your IaaC or CI/CD processes to get to a production deployment stage with such errors?

Additionally, it is possible to have Venafi automatically enforce a policy setting to only allow specific domains, or even enforce specific patterns (RegEx) on every request. With Trust Protection Platform specifically, very complex Adaptable Workflows can even be used to do automated verification of such things, even consulting CMDB or other data sources to correlate what is being requested.

Not only are these approaches more accurate and reliable than a human, they allow for true and full automation. I don't see where manual human approvals in the middle of an automated process can work. Do you have other steps in your terraform plans that do not complete for days or weeks? What is the behavior and how are you handling those today?

jkacou commented 3 months ago

On our side, the main concern is the cost since each public certificate imply some expenses. We rely already on the PR review for configuration check/validation ala gitops. So we need this approval flow for all public certificates requests to keep some "control" on the certificate creation cost. Waiting for the approval can work in some limited cases, but for sure, is not realist if this can take days.. It is why I was wondering if a transient state was possible. the plan could be to wait for a certain time (lets say 30 minutes) and then put the certificate creation state as incomplete until next run. That way, we do not have the timeout, we know what is happening whith the certificate to be create, the state knows the real state, and we don't have a forever waiting process.

jkacou commented 3 months ago

So the state could be updated with one of three possible states: created (will update the state, it is a success), incomplete (no change, it is neither a success nor a fail) , rejected (it is a fail, we can raise a error) this one is more likely the same as a certificate creation failed.