bluecmd / fortigate_exporter

Prometheus exporter for Fortigate firewalls
GNU General Public License v3.0
241 stars 79 forks source link

Scrape timeouts for 15 minutes after Fortigate failover #220

Open p-v-a opened 1 year ago

p-v-a commented 1 year ago

I have experienced a scraping timeout that lasts around 15 minutes after Fortigate node failover. exporter logs shows the following errors for the whole duration

2023/05/12 00:02:27 Error: API connectivity test failed, Get "https://forti.net:8443/api/v2/monitor/system/status": context canceled
2023/05/12 00:02:27 Probe of "https://forti.net:8443" failed, took 29.901 seconds

It's probably related to how Fortigate handles session pickup, however I found that disabling http2 for exporter solves this issue. As a workaround one can set environment variable GODEBUG=http2client=0, however would be good to have support in exporter for this scenario.

p-v-a commented 1 year ago

Just to add more details about this issue. It seems it related to how fortigate handles HTTPS session pickup. But by the look of it failure mode is a following:

So by switching off http2 via that GODEBUG env variable we force exporter to establish new http session for every scrap, thus work around this issue.

Probably solution would be to add some control to disable http2 when scrapping HA endpoint in config file, so user can control it, especially combined with #208, so you still can use http2 for scrapping metrics from individual nodes, ond only disabling http2 for HA endpoints.

lazyb0nes commented 1 year ago

Just to add more details about this issue. It seems it related to how fortigate handles HTTPS session pickup. But by the look of it failure mode is a following:

  • Secondary unit picks up TCP session, but not HTTPS (our boxes have different TLS certs, so secondary box don't have certificate of the primary and vice versa), I don't really experimented much with certificates though, so might not be the root cause, nevertheless it feels like it's something TLS related.
  • This cause Fortigate to ignore all incoming packets from exporter
  • On the other hand, exporter is using http2 persistent connection, which lead to it trying to reuse http connection if this is available
  • Because Fortigate never reply with TCP RST, rather just ignoring packets, exporter keeps getting timeouts until http2 timeout expires
  • exporter initiates new http2 session, which now established using correct TLS cert and everything begins to work as expected.

So by switching off http2 via that GODEBUG env variable we force exporter to establish new http session for every scrap, thus work around this issue.

Probably solution would be to add some control to disable http2 when scrapping HA endpoint in config file, so user can control it, especially combined with #208, so you still can use http2 for scrapping metrics from individual nodes, ond only disabling http2 for HA endpoints.

Did you set it in the systemd unit?, mine looks like this Environment="GODEBUG=http2client=0" I've tried this and i still have a timeout of about 6-8 minutes. This is running 7.0.12

p-v-a commented 11 months ago

Did you set it in the systemd unit?, mine looks like this Environment="GODEBUG=http2client=0" I've tried this and i still have a timeout of about 6-8 minutes. This is running 7.0.12

Sorry, for the late answer. In my case I'm running it inside kubernetes, so no systemd, just pod with certain environment variables set:

          env:
            # Workaround issue https://github.com/bluecmd/fortigate_exporter/issues/220
            - name: "GODEBUG"
              value: "http2client=0"