PaloAltoNetworks / terraform-provider-panos

Terraform Panos provider
https://www.terraform.io/docs/providers/panos/
Mozilla Public License 2.0
87 stars 71 forks source link

Session timed out error message on plan or apply #255

Open vabagaria opened 3 years ago

vabagaria commented 3 years ago

Describe the bug

Frequently receive 'Error: Session timed out' when trying to run plan or apply.

Expected behavior

Should finish plan and display changes or apply planned changes, every time.

Current behavior

Terraform quits with error 'Session timed out' frequently but not always.

Possible solution

This problem can be resolved by forcing terraform to only perform one operation (using -parallelism=1 to disable parallel requests) which is not a practical fix. Other possible solutions could include:

Steps to reproduce

The following is from a redacted/example plan where config matches production but values have been replaced with dummy data. Problem can be reproduced in lab with separate firewall and terraform host. It is not re-producible when Ansible is used with PanOS Collection with the same firewall and host.

Terraform code:


############
tftest2/main.tf
############
module "firewall_rules_app001" {
  source            = "./module/panos-worker"

  app_name          = "APP001"
  pat_port          = "10001"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app002" {
  source            = "./module/panos-worker"

  app_name          = "APP002"
  pat_port          = "10002"
  internal_endpoint = "app002.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app003" {
  source            = "./module/panos-worker"

  app_name          = "APP003"
  pat_port          = "10003"
  internal_endpoint = "app003.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app004" {
  source            = "./module/panos-worker"

  app_name          = "APP004"
  pat_port          = "10004"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app005" {
  source            = "./module/panos-worker"

  app_name          = "APP005"
  pat_port          = "10005"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app006" {
  source            = "./module/panos-worker"

  app_name          = "APP006"
  pat_port          = "10006"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app007" {
  source            = "./module/panos-worker"

  app_name          = "APP007"
  pat_port          = "10007"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

module "firewall_rules_app008" {
  source            = "./module/panos-worker"

  app_name          = "APP008"
  pat_port          = "10008"
  internal_endpoint = "app001.internal"
  internal_port     = 443
  internal_protocol = "tcp"
  app_type          = ["web-browsing", "ssl"]
  rule_state        = false # do not disable rule
}

############################
tftest2/module/panos-worker/main.tf
############################
resource "panos_address_object" "this" {
  name        = "int_${var.app_name}_fqdn"
  vsys        = "vsys1"
  type        = "fqdn"
  value       = var.internal_endpoint
  description = "${var.app_name}"
  tags        = ["dmz"]
}

resource "panos_service_object" "this" {
  name        = "${var.internal_protocol}-${var.pat_port}"
  vsys        = "vsys1"
  protocol    = var.internal_protocol
  description = "${var.app_name}"
  destination_port = var.pat_port
  tags      = ["dmz"]
}

resource "panos_security_rule_group" "this" {
  depends_on = [
    panos_service_object.this
  ]

  vsys                    = "vsys1"
  rule {
    name                  = var.app_name
    description           = "Rule for ${var.app_name}"
    type                  = "universal"
    tags                  = ["dmz"]
    source_zones          = ["untrust"]
    source_addresses      = ["any"]
    negate_source         = false
    source_users          = ["any"]
    hip_profiles          = ["any"]
    destination_zones     = ["trust"]
    destination_addresses = ["any"]
    negate_destination    = false
    applications          = var.app_type
    services              = [panos_service_object.this.name]
    categories            = ["any"]
    action                = "allow"
    log_setting           = "logtoexternal"
    log_start             = false
    log_end               = true
    disabled              = var.rule_state
    schedule              = ""
    virus                 = "default"
    spyware               = "default"
    vulnerability         = "default"
    url_filtering         = ""
    wildfire_analysis     = "default"
    data_filtering        = ""
  }
}

################################
tftest2/module/panos-worker/_variables.tf
################################
variable "app_name" {
  type = string
}

variable "pat_port" {
  type = number
}

variable "internal_endpoint" {
  type = string
}

variable "internal_port" {
  type = number
}

variable "internal_protocol" {
  type = string
}

variable "app_type" {
  type = list(string)
  description = "List of applications - e.g. web-browsing, ssl"
}

variable "rule_state" {
  type = bool
  description = "Set to true if you would like to disable the rules on the firewall"
} 

###############################
tftest2/module/panos-worker/_provider.tf
###############################
terraform {
  required_providers {
    panos = {
      source  = "paloaltonetworks/panos"
      version = "~> 1.6"
    }
  }
}

provider "panos" {
  hostname = "xx.xx.xx.xx"
  username = "xxxxxxxxx"
  password = "xxxxxxxxx"
  timeout  = 60
}

Screenshots

USER@HOST:~/gitlab-repos/tftest2 $ terraform apply
module.firewall_rules_app003.panos_address_object.this: Refreshing state... [id=vsys1:int_APP003_fqdn]
module.firewall_rules_app003.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10003]
module.firewall_rules_app003.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDAz]
module.firewall_rules_app004.panos_address_object.this: Refreshing state... [id=vsys1:int_APP004_fqdn]
module.firewall_rules_app004.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10004]
module.firewall_rules_app001.panos_address_object.this: Refreshing state... [id=vsys1:int_APP001_fqdn]
module.firewall_rules_app001.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10001]
module.firewall_rules_app004.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDA0]
module.firewall_rules_app007.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10007]
module.firewall_rules_app007.panos_address_object.this: Refreshing state... [id=vsys1:int_APP007_fqdn]
module.firewall_rules_app001.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDAx]
module.firewall_rules_app006.panos_address_object.this: Refreshing state... [id=vsys1:int_APP006_fqdn]
module.firewall_rules_app006.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10006]
module.firewall_rules_app005.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10005]
module.firewall_rules_app005.panos_address_object.this: Refreshing state... [id=vsys1:int_APP005_fqdn]
module.firewall_rules_app002.panos_address_object.this: Refreshing state... [id=vsys1:int_APP002_fqdn]
module.firewall_rules_app002.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10002]
module.firewall_rules_app008.panos_address_object.this: Refreshing state... [id=vsys1:int_APP008_fqdn]
module.firewall_rules_app008.panos_service_object.this: Refreshing state... [id=vsys1:tcp-10008]
module.firewall_rules_app007.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDA3]
module.firewall_rules_app005.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDA1]
module.firewall_rules_app006.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDA2]
module.firewall_rules_app002.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDAy]
module.firewall_rules_app008.panos_security_rule_group.this: Refreshing state... [id=vsys1:0::QVBQMDA4]

Warning: Interpolation-only expressions are deprecated

  on module/panos-worker/main.tf line 6, in resource "panos_address_object" "this":
   6:   description = "${var.app_name}"

Terraform 0.11 and earlier required all non-constant expressions to be
provided via interpolation syntax, but this pattern is now deprecated. To
silence this warning, remove the "${ sequence from the start and the }"
sequence from the end of this expression, leaving just the inner expression.

Template interpolation syntax is still used to construct strings from
expressions when the template includes multiple interpolation sequences or a
mixture of literal strings and interpolations. This deprecation applies only
to templates that consist entirely of a single interpolation sequence.

(and 15 more similar warnings elsewhere)

Error: Session timed out

USER@HOST:~/gitlab-repos/tftest2 $ 

Context

Trying to configure a number of firewalls with security rules and address objects for multiple apps and running plan/apply multiple times a day. The pipeline is failing frequently and needs to be re-run manually.

Your Environment

Version used:

shinmog commented 3 years ago

Your provider timeout is set to 60 right now - if you increase that (say 120 or whatever) does it finish ok?

vabagaria commented 3 years ago

Your provider timeout is set to 60 right now - if you increase that (say 120 or whatever) does it finish ok?

Setting it to any higher than 60 results in an error as it is the maximum timeout value.

shinmog commented 3 years ago

I'll increase the max from 60 to 600 in the next provider release.

harsh-vm commented 3 years ago

Your provider timeout is set to 60 right now - if you increase that (say 120 or whatever) does it finish ok?

Setting it to any higher than 60 results in an error as it is the maximum timeout value.

I've seen this issue as well.

shinmog commented 3 years ago

Ok, v1.7.0 is out now. I set the timeout to 600 sec now, IIRC, so increase that and let me know the results.

vabagaria commented 3 years ago

For those still having this issue with 1.7, instead of using username and password try generating an API key using the Go code provided in the TF provider documentation and pass that to TF. This seems to have solved the issue for me.

shinmog commented 3 years ago

I slightly misspoke above: I didn't increase the default timeout to 600 sec, I increased the max timeout to 600 sec. I'm uncertain if having a max timeout is a useful thing or not.. Will reassess after this issue is resolved.

With regards to what @vabagaria said above - I actually removed that chunk of documentation because I wasn't sure if it was a useful thing to have or not. So instead of asking people to install pango and compile code to get a simple answer, I've just released another provider version, 1.8.0, which has panos_api_key that you can use to find out the API key for the provider auth you've supplied.

shinmog commented 3 years ago

@vabagaria Seems like this issue is resolved..?

vabagaria commented 3 years ago

@shinmog I haven't been able to reproduce the issue on tf-plan with 1.8 but I have gotten session time out errors on tf-apply.

shinmog commented 3 years ago

So you have the timeout set to 600 seconds and it times out..? Does it actually sit around and take those 10 minutes or..?

jubrad commented 2 years ago

@shinmog I'm also seeing this issues or at least something very similar.

So you have the timeout set to 600 seconds and it times out..? Does it actually sit around and take those 10 minutes or..?

I've been setting the following in my env to work around this: TF_CLI_ARGS_destroy="-parallelism=2"

Since I see this issue with a few local exec provisioners running python pa libraries which cover configure resources not supplied by this terraform provider I can only assume this is an issue with the PA server or a general issue with both client modules.

I think parallel requests are causing a Session timed out, not any http or tcp/network issues. The Session timed out issue oddly enough happens with a server HTTP response of 200 ok...

To figure out if this was really Parallel requests I setup a test environment which creates many static entries in the PA DNS proxy, this uses python, but again I think the issues are related. When run in series I've completed over 4k requests with no timeout issues, however, running this test in parallel ~10-20 threads, I'm able to occasionally get a session timeout, upping to ~60 threads concurrently creating 10 records I'm able to repro the timeout consistently.

I have created a pr for pan-python which immediately reads and closes the http connection in a context manager.. With that in place I don't see the python issue anymore, but I saw it far less frequently in python than pango/tf to begin with. I believe closing the connections just lowers the likelihood of hitting the problem and I'm seeing it go because there are more resources defined with terraform panos resources.

My guess now is that there's something up with the server, it either struggles with responding to high volumes of requests, or there's some random collision that can occur.

If this is a PA server error would something like the following be possible? Perhaps with exponential backoffs? https://github.com/PaloAltoNetworks/pango/compare/master...jubrad:retry-on-timeout?expand=1

shinmog commented 2 years ago

Ok, if this is a parallelism issue, then I've come across this already in the provider:

https://github.com/PaloAltoNetworks/terraform-provider-panos/issues/264#issuecomment-788118806