fortinetdev / terraform-provider-fortios

Terraform Fortios provider
https://www.terraform.io/docs/providers/fortios/
Mozilla Public License 2.0
68 stars 50 forks source link

ERROR: produced an unexpected new value: Root resource was present, but now absent. #333

Open natemellendorf opened 1 month ago

natemellendorf commented 1 month ago

Good evening, everyone.

We're in the process of deploying multiple Fortinet NGFW instances in AWS. Things have been going smoothly, but we've hit a snag and we're not sure what to make of it.

I'll provide debugs and details below, but let me know if more information is needed.

Setup

We have multiple fortinet providers configured, each having a unique alias. We have our firewall configuration tied to a single terraform module, which we duplicate and pass each provider into.

module "use1-az6-a" {
  source = "../../modules/tfm_fortinet_config"
  providers = {
    fortios = fortios.use1-az6-a
  }
  policy_config = local.fortinet_policies
}

module "use1-az6-b" {
  source = "../../modules/tfm_fortinet_config"
  providers = {
    fortios = fortios.use1-az6-b
  }
  policy_config = local.fortinet_policies
}

Example logic in the module, which creates all the addresses that are passed in:

resource "fortios_firewall_address" "this" {
  for_each      = var.policy_config["addresses"]
  name          = each.value.name
  color         = try(each.value.color, 0)
  fqdn          = try(each.value.fqdn, null)
  subnet        = try(each.value.subnet, null)
  country       = try(each.value.country, null)
  wildcard_fqdn = try(each.value.wildcard_fqdn, null)
  obj_type      = try(each.value.obj_type, null)
  sub_type      = try(each.value.sub_type, null)
  type          = try(each.value.type, null)
}

Issue

When running a terraform plan, everything looks great. I'll see my firewalls each needing to have their policies applied. However, when running terraform apply, we get inconsistent results. Sometimes everything applies smoothly, and others, the apply will fail on any of our firewalls. When it fails, this is the error message we get:

│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to module.use1-az6-b.fortios_firewall_address.this["CVIAWS020"], provider "provider[\"registry.terraform.io/fortinetdev/fortios\"].use1-az6-b" produced an unexpected new value: Root resource was present, but now absent.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

Debugs

debug.json

Observation

I noticed that when these failures occur, I see this in the trace: \"matched_count\":2,\n

I see a similar log for other resources that create successfully, but they log: \"matched_count\":1,\n

As these firewalls have little to no firewall configuration on them when we apply terraform, I find this behavior puzzling.

When I login the firewall that produced the error message, I’ll find that the resource that threw the error was created.

Running a terraform apply again will produce the typical 500 server error, because the resource already exists on the firewall (or I believe that’s why)

This error occurs randomly. As in, it happens to any one of the multiple firewalls we're configuring. It also happens to random resources, not just CVIAWS020. However, in my testing, it seems to only happen to firewall addresses (objects).

{
    "@level": "info",
    "@message": "2024/08/02 23:28:33 FOS-fortios reading response: {\n  \"http_method\":\"GET\",\n  \"size\":40,\n  \"limit_reached\":false,\n  \"matched_count\":2,\n  \"next_idx\":16,\n  \"revision\":\"0b97f06be46d75f9885025723106e92a\",\n  \"cli_error\":[\n  ],\n  \"status\":\"error\",\n  \"http_status\":404,\n  \"vdom\":\"FG-traffic\",\n  \"path\":\"firewall\",\n  \"name\":\"address\",\n  \"mkey\":\"CVIAWS020\",\n  \"serial\":\"FGTAWSHTYM1GEG90\",\n  \"version\":\"v7.4.4\",\n  \"build\":2662\n}",
    "@module": "provider.terraform-provider-fortios_v1.20.0",
    "@timestamp": "2024-08-02T23:28:33.331674Z",
    "timestamp": "2024-08-02T23:28:33.331Z"
}
{
    "@level": "info",
    "@message": "2024/08/02 23:28:33 [WARN] resource (CVIAWS020) not found, removing from state",
    "@module": "provider.terraform-provider-fortios_v1.20.0",
    "@timestamp": "2024-08-02T23:28:33.331719Z",
    "timestamp": "2024-08-02T23:28:33.331Z"
}
natemellendorf commented 1 month ago

For what it's worth, after some debugging, I noticed that after a successful POST to the firewall address endpoint by the provider, it immediately performs a GET against the returned resource id. As the POST itself will return a status for the initial request (success/failure), it felt redundant to perform the extra GET after a success being returned. Though, I may be misunderstanding the purpose of that extra GET request and why it's on all firewall address actions except DELETE.

I've pulled the provider locally, removed that extra GET, and verified that my firewall addresses, address groups, services, service groups, and policies are building correctly.

This is just a data point I've collected. Still not sure why the firewall would return a 404 or matched_count of 2 on newly created firewall addresses. It almost feels like some kind of eventual consistency issue? though, that's just a guess.

MaxxLiu22 commented 1 month ago

Hi @natemellendorf ,

Thank you for raising this issue and providing this valuable information. It is quite strange; it appears that the FOS return shows two objects existing in the current URL path, but the content is missing, resulting in a 404 error. As you mentioned, if this issue occurs randomly across different resources, it may be related to the logic of how the backend handles requests.

I wonder if it is possible for you to provide your var.policy_config file and hide sensitive information since I can't reproduce this issue on my end when creating 20 firewall addresses. Alternatively, we could enable the debug function on FOS to see what is happening on the backend.

diagnose debug application httpsd -1
diagnose debug enable

Thanks, Maxx

natemellendorf commented 1 month ago

@MaxxLiu22

Thanks for taking a look and responding to my issue.

Your observation and subsequent concern is where I landed too. I enabled those debug commands on one of the six Fortinet NGFWs two days ago, and they didn’t reveal much for me.

I’ll start over, enable them again, and run the apply until the firewall produces the error. I’ll also supply a full working example of my terraform configuration with sensitive info redacted.

I should have this for you tomorrow,.

thanks again,