TechnitiumSoftware / DnsServer

Technitium DNS Server
https://technitium.com/dns/
GNU General Public License v3.0
3.83k stars 399 forks source link

[bug] failover app not working since update #866

Closed Tivin-i closed 3 months ago

Tivin-i commented 4 months ago

Expected behaviour: TXT status showing "healthy"

What is happening: TXT status showing "Unknown" on all monitors, across all servers.

This is tested with the following check types: ping, tcp3306, tcp6379 ("redis"), https.

Below I configured:

{
  "primary": [
    "1.1.1.1"
  ],
  "secondary": [
    "1.0.0.1"
  ],
  "healthCheck": "ping",
  "healthCheckUrl": "https://www.example.com/",
  "allowTxtStatus": true
}

DNS query for TXT record:

  "Answer": [
    {
      "Name": "cf.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "92 bytes",
      "RDATA": {
        "Text": "app=failover; addressType=Primary; address=1.1.1.1; healthCheck=ping; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "cf.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "94 bytes",
      "RDATA": {
        "Text": "app=failover; addressType=Secondary; address=1.0.0.1; healthCheck=ping; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    }
  ],

Failover app config:

    {
      "name": "redis",
      "type": "tcp",
      "interval": 60,
      "retries": 3,
      "timeout": 10,
      "port": 6379,
      "emailAlert": "default",
      "webHook": "default"
    },
{
      "name": "tcp3306",
      "type": "tcp",
      "interval": 60,
      "retries": 3,
      "timeout": 10,
      "port": 3306,
      "emailAlert": "default",
      "webHook": "default"
    },
    {
      "name": "ping",
      "type": "ping",
      "interval": 60,
      "retries": 3,
      "timeout": 10,
      "emailAlert": "default",
      "webHook": "default"
    },

cf.int.xx.xx does not return an error, a different few records did return errors such as:

[2024-02-16 15:38:06 UTC] DNS App [Failover]: ALERT! Domain [crdb-01.xxx.xxx] type [SOA] status is FAILED based on 'ping' health check. The failure reason is: Not supported.

[2024-02-16 15:33:20 UTC] DNS App [Failover]: ALERT! Domain [redis-1.int.xxx.xxx] type [SOA] status is FAILED based on 'redis' health check. The failure reason is: Not supported. [2024-02-16 15:33:20 UTC] DNS App [Failover]: System.Net.Http.HttpRequestException: Response status code does not indicate success: 404 (Not Found).

Last line is probably because it can't find my testing webhook.

ShreyasZare commented 4 months ago

Thanks for the feedback. The failover app starts health check only when it receives the first A/AAAA request for that domain name. I would suggest that you first make an A record request for the domain and then check the health status with TXT request.

[2024-02-16 15:38:06 UTC] DNS App [Failover]: ALERT! Domain [crdb-01.xxx.xxx] type [SOA] status is FAILED based on 'ping' health check. The failure reason is: Not supported.

[2024-02-16 15:33:20 UTC] DNS App [Failover]: ALERT! Domain [redis-1.int.xxx.xxx] type [SOA] status is FAILED based on 'redis' health check. The failure reason is: Not supported.

This error log was generated since the app received a request for SOA record which is not supported since the app only returns A/AAAA records. This log entry is confusing for sure so will get the app updated to avoid that log entry.

[2024-02-16 15:33:20 UTC] DNS App [Failover]: System.Net.Http.HttpRequestException: Response status code does not indicate success: 404 (Not Found).

This seems to be webhook related error log. Will add more details in the log entry to make it better.

Tivin-i commented 4 months ago

Heya! Redis-1 has A and AAAA records configured. CleanShot 2024-02-17 at 18 42 23

ShreyasZare commented 4 months ago

Heya! Redis-1 has A and AAAA records configured.

You need to query the failover APP record domain i.e. cf.int.xxx.xxx in your case. I am not sure about redis A/AAAA records and they wont be used by failover app.

Tivin-i commented 4 months ago

Ah, that was just an example because it was too much to redact, sorry about that. Here is a configuration of redis failover cofiguration that returns unhealthy.

  "primary": "redis-2.int.xxx.xxx",
  "secondary": [
    "redis-1.int.xxx.xxx",
    "redis-0.int.xxx.xxx",
    "redis-4.int.xxx.xxx",
    "redis-5.int.xxx.xxx"
  ],
  "serverDown": "status.xxx.xxx",
  "healthCheck": "redis",
  "healthCheckUrl": null,
  "allowTxtStatus": true
}
{
  "Metadata": {
    "NameServer": "sg.xxx.xxx (127.0.0.1)",
    "Protocol": "Tcp",
    "DatagramSize": "1460 bytes",
    "RoundTripTime": "6.3 ms"
  },
  "EDNS": {
    "UdpPayloadSize": 1232,
    "ExtendedRCODE": "NoError",
    "Version": 0,
    "Flags": "None",
    "Options": []
  },
  "Identifier": 0,
  "IsResponse": true,
  "OPCODE": "StandardQuery",
  "AuthoritativeAnswer": true,
  "Truncation": false,
  "RecursionDesired": true,
  "RecursionAvailable": true,
  "Z": 0,
  "AuthenticData": false,
  "CheckingDisabled": false,
  "RCODE": "NoError",
  "QDCOUNT": 1,
  "ANCOUNT": 10,
  "NSCOUNT": 0,
  "ARCOUNT": 1,
  "Question": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN"
    }
  ],
  "Answer": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "125 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Primary; domain=redis-2.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "128 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Primary; domain=redis-2.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-1.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-1.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-0.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-0.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-4.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-4.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-5.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-5.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    }
  ],
  "Authority": [],
  "Additional": [
    {
      "Name": "",
      "Type": "OPT",
      "Class": "1232",
      "TTL": "0 (0 sec)",
      "RDLENGTH": "0 bytes",
      "RDATA": {
        "Options": []
      },
      "DnssecStatus": "Disabled"
    }
  ]
}
ShreyasZare commented 4 months ago

Ah, that was just an example because it was too much to redact, sorry about that. Here is a configuration of redis failover cofiguration that returns unhealthy.

Ohk. Now first use the DNS Client and query redis.int.xxx.xxx for A record. After that change the type to TXT and query again. This time it should show some status for the primary entry.

Tivin-i commented 4 months ago

Yea, it shows healthy now. So it only updates the TXT record when there is a query done directly at the record?

{
  "Metadata": {
    "NameServer": "sg.xxx.xxx (127.0.0.1)",
    "Protocol": "Udp",
    "DatagramSize": "97 bytes",
    "RoundTripTime": "1.33 ms"
  },
  "EDNS": {
    "UdpPayloadSize": 1232,
    "ExtendedRCODE": "NoError",
    "Version": 0,
    "Flags": "None",
    "Options": []
  },
  "Identifier": 0,
  "IsResponse": true,
  "OPCODE": "StandardQuery",
  "AuthoritativeAnswer": true,
  "Truncation": false,
  "RecursionDesired": true,
  "RecursionAvailable": true,
  "Z": 0,
  "AuthenticData": false,
  "CheckingDisabled": false,
  "RCODE": "NoError",
  "QDCOUNT": 1,
  "ANCOUNT": 2,
  "NSCOUNT": 0,
  "ARCOUNT": 1,
  "Question": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "A",
      "Class": "IN"
    }
  ],
  "Answer": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "CNAME",
      "Class": "IN",
      "TTL": "10 (10 sec)",
      "RDLENGTH": "10 bytes",
      "RDATA": {
        "Domain": "redis-2.int.xxx.xxx"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis-2.int.xxx.xxx",
      "Type": "A",
      "Class": "IN",
      "TTL": "3600 (1 hour)",
      "RDLENGTH": "4 bytes",
      "RDATA": {
        "IPAddress": "100.335.333.332"
      },
      "DnssecStatus": "Disabled"
    }
  ],
  "Authority": [],
  "Additional": [
    {
      "Name": "",
      "Type": "OPT",
      "Class": "1232",
      "TTL": "0 (0 sec)",
      "RDLENGTH": "0 bytes",
      "RDATA": {
        "Options": []
      },
      "DnssecStatus": "Disabled"
    }
  ]
}

  "Metadata": {
    "NameServer": "sg.xxx.xxx (127.0.0.1)",
    "Protocol": "Tcp",
    "DatagramSize": "1460 bytes",
    "RoundTripTime": "1.27 ms"
  },
  "EDNS": {
    "UdpPayloadSize": 1232,
    "ExtendedRCODE": "NoError",
    "Version": 0,
    "Flags": "None",
    "Options": []
  },
  "Identifier": 0,
  "IsResponse": true,
  "OPCODE": "StandardQuery",
  "AuthoritativeAnswer": true,
  "Truncation": false,
  "RecursionDesired": true,
  "RecursionAvailable": true,
  "Z": 0,
  "AuthenticData": false,
  "CheckingDisabled": false,
  "RCODE": "NoError",
  "QDCOUNT": 1,
  "ANCOUNT": 10,
  "NSCOUNT": 0,
  "ARCOUNT": 1,
  "Question": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN"
    }
  ],
  "Answer": [
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "125 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Primary; domain=redis-2.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Healthy;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "128 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Primary; domain=redis-2.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-1.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-1.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-0.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-0.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-4.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-4.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "127 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-5.int.xxx.xxx; qType: A; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    },
    {
      "Name": "redis.int.xxx.xxx",
      "Type": "TXT",
      "Class": "IN",
      "TTL": "30 (30 sec)",
      "RDLENGTH": "130 bytes",
      "RDATA": {
        "Text": "app=failover; cnameType=Secondary; domain=redis-5.int.xxx.xxx; qType: AAAA; healthCheck=redis; healthStatus=Unknown;"
      },
      "DnssecStatus": "Disabled"
    }
  ],
  "Authority": [],
  "Additional": [
    {
      "Name": "",
      "Type": "OPT",
      "Class": "1232",
      "TTL": "0 (0 sec)",
      "RDLENGTH": "0 bytes",
      "RDATA": {
        "Options": []
      },
      "DnssecStatus": "Disabled"
    }
  ]
}
ShreyasZare commented 4 months ago

Yea, it shows healthy now. So it only updates the TXT record when there is a query done directly at the record?

The health check monitoring is done only for "active" records. This is done so as to save the server's resources. The server can have several zones with multiple APP records for failover. If it keeps on monitoring all records then it will use significant resources.

Which is why the secondary entry status is still unknown. The health check for it will be done when the primary server fails and when the app receives A/AAAA request which will then trigger health check for the secondary server.

If the APP record does not get an A/AAAA request for over an hour then the health check monitoring for it stops and the status will become unknown again.

Tivin-i commented 4 months ago

Thank you for the clarification!

ShreyasZare commented 4 months ago

Thank you for the clarification!

You're welcome!