F5Networks / f5-appsvcs-extension

F5 BIG-IP Application Services 3 Extension
Apache License 2.0
163 stars 52 forks source link

Declarations failing after 3.49 upgrade - issues updating pool members #807

Open m-kratochvil opened 4 months ago

m-kratochvil commented 4 months ago

Note - this is potentially a duplicate to #805 but the symptoms are not exactly the same, so I'm opening this separately. Feel free to close as duplicate if indeed it is.

Environment

Summary

Few days after upgrade from 3.36 to 3.49 our automated tests started to fail. There are two types of failures we see, all related to pool members:

Mon, 26 Feb 2024 15:06:56 GMT - severe: [appsvcs] {"message":"Declaration failed: 01020036:3: The requested Pool Member (/net_13687b13_52c0_46a9_be99_35227ae63363/lb_01c18422-d418-4ecc-9cf5-56bc7c8ff5e2/pool_3b80c988-4c7e-4915-a152-1f4f2ce8e63f /net_13687b13_52c0_46a9_be99_35227ae63363/192.0.2.1%3134 80) was not found.","level":"error"}

Mon, 26 Feb 2024 15:01:05 GMT - severe: [appsvcs] {"message":"Declaration failed: 01020066:3: The requested Pool Member (/net_b4cc9b06_44ec_46a4_ba43_09dfb358be3e/lb_79e4f007-6d37-4013-bbc1-c43455f3e8fe/pool_392b9381-edef-412d-a047-a155dad585fb /net_b4cc9b06_44ec_46a4_ba43_09dfb358be3e/192.0.2.3%3038 81) already exists in partition net_b4cc9b06_44ec_46a4_ba43_09dfb358be3e.","level":"error"}

The second error BTW matches the one reported in the linked issue #805. I have so far only investigated into the first error (pool member not found), and what I see is that the pool member does exist in the Big-IP TMOS config prior to the declaration being submitted (the declaration changes the pool member status to "disabled"), so it should normally be updated and I'm out of ideas why the declaration fails. For now I'm focusing on that first error only, as I believe the second one is most likely related to the same underlying issue. I can reproduce this "manually", outside of our testing platform.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Have a pool member in the TMOS configuration:

    ltm pool /net_13687b13_52c0_46a9_be99_35227ae63363/lb_01c18422-d418-4ecc-9cf5-56bc7c8ff5e2/pool_3b80c988-4c7e-4915-a152-1f4f2ce8e63f {
    description lb_member_pool1_member-1371282949
    load-balancing-mode ratio-member
    members {
        /net_13687b13_52c0_46a9_be99_35227ae63363/192.0.2.1%3134:http {
            address 192.0.2.1%3134
            description 5c5d14d8-a92b-4dbb-866a-a5f53d996e75
            ratio 20
        }
    }
    min-active-members 1
    partition net_13687b13_52c0_46a9_be99_35227ae63363
    }
  2. Submit the following declaration:

    {
    "action": "deploy",
    "class": "AS3",
    "declaration": {
        "class": "ADC",
        "id": "urn:uuid:faef24e7-2e4d-4087-95d8-4c35c0b0b274",
        "label": "F5 BigIP Provider",
        "net_13687b13_52c0_46a9_be99_35227ae63363": {
            "class": "Tenant",
            "defaultRouteDomain": 3134,
            "label": "project_32e0d9ed2f97404d95fc0c77dbeabc63",
            "lb_01c18422-d418-4ecc-9cf5-56bc7c8ff5e2": {
                "class": "Application",
                "label": "01c18422-d418-4ecc-9cf5-56bc7c8ff5e2",
                "listener_7e6a9298-c7ce-4f42-b4dc-8f8e3b7d5382": {
                    "class": "Service_L4",
                    "iRules": [],
                    "label": "lb_member_listener1_member-1389488391",
                    "maxConnections": 200,
                    "persistenceMethods": [],
                    "policyEndpoint": [],
                    "pool": "pool_3b80c988-4c7e-4915-a152-1f4f2ce8e63f",
                    "profileL4": {
                        "bigip": "/Common/fastL4_profile"
                    },
                    "snat": "self",
                    "virtualAddresses": [
                        "10.1.1.180"
                    ],
                    "virtualPort": 8000
                },
                "pool_3b80c988-4c7e-4915-a152-1f4f2ce8e63f": {
                    "class": "Pool",
                    "label": "lb_member_pool1_member-1371282949",
                    "loadBalancingMode": "ratio-member",
                    "members": [
                        {
                            "adminState": "disable",
                            "enable": true,
                            "priorityGroup": 1,
                            "ratio": 20,
                            "remark": "5c5d14d8-a92b-4dbb-866a-a5f53d996e75",
                            "serverAddresses": [
                                "192.0.2.1"
                            ],
                            "servicePort": 80
                        }
                    ],
                    "remark": "lb_member_pool1_member-1371282949"
                },
                "template": "generic"
            }
        },
        "schemaVersion": "3.19.0",
        "updateMode": "selective"
    },
    "historyLimit": 2,
    "logLevel": "warning",
    "persist": false,
    "trace": false
    }
  3. Observe the following error response:

    {
        "code": 422,
        "declarationId": "urn:uuid:faef24e7-2e4d-4087-95d8-4c35c0b0b274",
        "host": "localhost",
        "message": "declaration failed",
        "response": "01020036:3: The requested Pool Member (/net_13687b13_52c0_46a9_be99_35227ae63363/lb_01c18422-d418-4ecc-9cf5-56bc7c8ff5e2/pool_3b80c988-4c7e-4915-a152-1f4f2ce8e63f /net_13687b13_52c0_46a9_be99_35227ae63363/192.0.2.1%3134 80) was not found.",
        "runTime": 1372,
        "tenant": "net_13687b13_52c0_46a9_be99_35227ae63363"
    }

Expected Behavior

The pool member should be updated, as it exists on the Big-IP.

Actual Behavior

The declaration fails with error "Pool member not found".

m-kratochvil commented 4 months ago

Further investigation on our side shows that the declaration fails when the pool member is set to "adminState": "disable" and "priorityGroup": 1, is added to the declaration. It could either be only one of these two things, or both that cause the failure, we don't know that yet.

Here is the last successful declaration and the first one that fails, so you can do a diff.

{
    "action": "deploy",
    "class": "AS3",
    "declaration": {
        "class": "ADC",
        "id": "urn:uuid:5e443f46-1dc0-43ef-afc4-c3cd752e46ad",
        "label": "F5 BigIP Provider",
        "net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24": {
            "class": "Tenant",
            "defaultRouteDomain": 3063,
            "label": "project_7243f3f80a99434292d6766069d3961a",
            "lb_9605e3e2-1be0-44d4-beee-9362e53c54f7": {
                "class": "Application",
                "label": "9605e3e2-1be0-44d4-beee-9362e53c54f7",
                "listener_7ea1d149-039a-4c54-a126-9ada3b60964a": {
                    "class": "Service_L4",
                    "iRules": [],
                    "label": "lb_member_listener1_member-436580696",
                    "maxConnections": 200,
                    "persistenceMethods": [],
                    "policyEndpoint": [],
                    "pool": "pool_040d6597-720a-4676-8954-fddfb56df4b5",
                    "profileL4": {
                        "bigip": "/Common/fastL4_profile"
                    },
                    "snat": "self",
                    "virtualAddresses": [
                        "10.1.1.59"
                    ],
                    "virtualPort": 8000
                },
                "pool_040d6597-720a-4676-8954-fddfb56df4b5": {
                    "class": "Pool",
                    "label": "lb_member_pool1_member-698855247",
                    "loadBalancingMode": "ratio-member",
                    "members": [
                        {
                            "adminState": "enable",
                            "enable": true,
                            "ratio": 20,
                            "remark": "39e9668c-2483-41ca-bfd6-73fea4609136",
                            "serverAddresses": [
                                "192.0.2.1"
                            ],
                            "servicePort": 80
                        }
                    ],
                    "remark": "lb_member_pool1_member-698855247"
                },
                "template": "generic"
            }
        },
        "schemaVersion": "3.19.0",
        "updateMode": "selective"
    },
    "historyLimit": 2,
    "logLevel": "warning",
    "persist": false,
    "trace": false
}[
    {
        "code": 200,
        "declarationId": "urn:uuid:5e443f46-1dc0-43ef-afc4-c3cd752e46ad",
        "host": "localhost",
        "lineCount": 22,
        "message": "success",
        "runTime": 1520,
        "tenant": "net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24"
    }
]
{
    "action": "deploy",
    "class": "AS3",
    "declaration": {
        "class": "ADC",
        "id": "urn:uuid:e806dee3-7f35-4d8f-8a68-4e8d5f34a960",
        "label": "F5 BigIP Provider",
        "net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24": {
            "class": "Tenant",
            "defaultRouteDomain": 3063,
            "label": "project_7243f3f80a99434292d6766069d3961a",
            "lb_9605e3e2-1be0-44d4-beee-9362e53c54f7": {
                "class": "Application",
                "label": "9605e3e2-1be0-44d4-beee-9362e53c54f7",
                "listener_7ea1d149-039a-4c54-a126-9ada3b60964a": {
                    "class": "Service_L4",
                    "iRules": [],
                    "label": "lb_member_listener1_member-436580696",
                    "maxConnections": 200,
                    "persistenceMethods": [],
                    "policyEndpoint": [],
                    "pool": "pool_040d6597-720a-4676-8954-fddfb56df4b5",
                    "profileL4": {
                        "bigip": "/Common/fastL4_profile"
                    },
                    "snat": "self",
                    "virtualAddresses": [
                        "10.1.1.59"
                    ],
                    "virtualPort": 8000
                },
                "pool_040d6597-720a-4676-8954-fddfb56df4b5": {
                    "class": "Pool",
                    "label": "lb_member_pool1_member-698855247",
                    "loadBalancingMode": "ratio-member",
                    "members": [
                        {
                            "adminState": "disable",
                            "enable": true,
                            "priorityGroup": 1,
                            "ratio": 20,
                            "remark": "39e9668c-2483-41ca-bfd6-73fea4609136",
                            "serverAddresses": [
                                "192.0.2.1"
                            ],
                            "servicePort": 80
                        }
                    ],
                    "remark": "lb_member_pool1_member-698855247"
                },
                "template": "generic"
            }
        },
        "schemaVersion": "3.19.0",
        "updateMode": "selective"
    },
    "historyLimit": 2,
    "logLevel": "warning",
    "persist": false,
    "trace": false
}[
    {
        "code": 422,
        "declarationId": "urn:uuid:e806dee3-7f35-4d8f-8a68-4e8d5f34a960",
        "host": "localhost",
        "message": "declaration failed",
        "response": "01020036:3: The requested Pool Member (/net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24/lb_9605e3e2-1be0-44d4-beee-9362e53c54f7/pool_040d6597-720a-4676-8954-fddfb56df4b5 /net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24/192.0.2.1%3063 80) was not found.",
        "runTime": 1925,
        "tenant": "net_f4db9dbe_1d85_4aa4_84dd_3aea727aaa24"
    }
]
m-kratochvil commented 3 months ago

UPDATE

After installing AS3 pre-release 3.50 the first issue from the above Summary is resolved (declaration fails with pool member not found

The second issue still remains - declaration fails with error pool member already exists in partition.

Below I post two declaration

If you do a diff between those two declarations, you’ll see that the change we do, that results in the failed declaration is that we add a second pool member to a pool, plus we set priority group 2 to the new pool member but also set priority group 1 to the already existing pool member.

Last successful declaration:

{
    "action": "deploy",
    "class": "AS3",
    "declaration": {
        "class": "ADC",
        "id": "urn:uuid:973a48f1-f1c1-4d41-b1e7-8e03b49c63c3",
        "label": "F5 BigIP Provider",
        "net_13ba6482_a3b1_4a8e_9845_9748bc6df450": {
            "class": "Tenant",
            "defaultRouteDomain": 3035,
            "label": "project_32e0d9ed2f97404d95fc0c77dbeabc63",
            "lb_5599f0fe-3ad1-42ee-a761-7f86674a34d5": {
                "class": "Application",
                "label": "5599f0fe-3ad1-42ee-a761-7f86674a34d5",
                "pool_14d602f4-b61a-45cb-a13a-61db76b89ae0": {
                    "class": "Pool",
                    "label": "lb_member_pool3_member-batch-1380482994",
                    "loadBalancingMode": "ratio-member",
                    "members": [
                        {
                            "adminState": "enable",
                            "enable": true,
                            "ratio": 20,
                            "remark": "eec1e386-c349-4195-b1cd-6c508f736fb5",
                            "serverAddresses": [
                                "192.0.2.1"
                            ],
                            "servicePort": 80
                        }
                    ],
                    "remark": "lb_member_pool3_member-batch-1380482994"
                },
                "template": "generic"
            }
        },
        "schemaVersion": "3.19.0",
        "updateMode": "selective"
    },
    "historyLimit": 2,
    "logLevel": "warning",
    "persist": false,
    "trace": false
}[
    {
        "code": 200,
        "declarationId": "urn:uuid:973a48f1-f1c1-4d41-b1e7-8e03b49c63c3",
        "host": "localhost",
        "lineCount": 20,
        "message": "success",
        "runTime": 1449,
        "tenant": "net_13ba6482_a3b1_4a8e_9845_9748bc6df450"
    }
]

Failed declaration afterwards:

{
    "action": "deploy",
    "class": "AS3",
    "declaration": {
        "class": "ADC",
        "id": "urn:uuid:28ac4625-6160-45b5-926b-848fc8f3fb3e",
        "label": "F5 BigIP Provider",
        "net_13ba6482_a3b1_4a8e_9845_9748bc6df450": {
            "class": "Tenant",
            "defaultRouteDomain": 3035,
            "label": "project_32e0d9ed2f97404d95fc0c77dbeabc63",
            "lb_5599f0fe-3ad1-42ee-a761-7f86674a34d5": {
                "class": "Application",
                "label": "5599f0fe-3ad1-42ee-a761-7f86674a34d5",
                "pool_14d602f4-b61a-45cb-a13a-61db76b89ae0": {
                    "class": "Pool",
                    "label": "lb_member_pool3_member-batch-1380482994",
                    "loadBalancingMode": "ratio-member",
                    "members": [
                        {
                            "adminState": "enable",
                            "enable": true,
                            "priorityGroup": 2,
                            "ratio": 20,
                            "remark": "eec1e386-c349-4195-b1cd-6c508f736fb5",
                            "serverAddresses": [
                                "192.0.2.1"
                            ],
                            "servicePort": 80
                        },
                        {
                            "adminState": "enable",
                            "enable": true,
                            "priorityGroup": 1,
                            "ratio": 20,
                            "remark": "5926268d-eec1-4047-8d86-66e6aa0d2951",
                            "serverAddresses": [
                                "192.0.2.3"
                            ],
                            "servicePort": 81
                        }
                    ],
                    "remark": "lb_member_pool3_member-batch-1380482994"
                },
                "template": "generic"
            }
        },
        "schemaVersion": "3.19.0",
        "updateMode": "selective"
    },
    "historyLimit": 2,
    "logLevel": "warning",
    "persist": false,
    "trace": false
}[
    {
        "code": 422,
        "declarationId": "urn:uuid:28ac4625-6160-45b5-926b-848fc8f3fb3e",
        "host": "localhost",
        "message": "declaration failed",
        "response": "01020066:3: The requested Pool Member (/net_13ba6482_a3b1_4a8e_9845_9748bc6df450/lb_5599f0fe-3ad1-42ee-a761-7f86674a34d5/pool_14d602f4-b61a-45cb-a13a-61db76b89ae0 /net_13ba6482_a3b1_4a8e_9845_9748bc6df450/192.0.2.3%3035 81) already exists in partition net_13ba6482_a3b1_4a8e_9845_9748bc6df450.",
        "runTime": 1436,
        "tenant": "net_13ba6482_a3b1_4a8e_9845_9748bc6df450"
    }
]

I did check on the Big-IP and the pool member in the error message does not exist in the config, so it should be created without issues.

m-kratochvil commented 3 months ago

UPDATE We're now running the official release 3.50

I looked at the actual cli script of the failed declaration, and there are clearly two consecutive actions trying to create the same pool member under a pool. This results in the observed error.

In high-level, it looks like this:

tmsh create ltm pool <pool> <some pool options> members replace-all-with
<partition>/192.0.2.1%3109:80 { <some member options> }
<partition>/192.0.2.3%3109:81 { <some member options> }

tmsh modify ltm pool <pool> members add 
<partition>/192.0.2.3%3109:81 { <exactly the same member options }

Full transaction:

cli script __appsvcs_update {
proc script::run {} {
if {[catch {
tmsh::modify ltm data-group internal __appsvcs_update records none
} err]} {
tmsh::create ltm data-group internal __appsvcs_update type string records none
}
if { [catch {
tmsh::modify ltm pool /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/lb_98778260-f5d6-498d-9228-4612113c3c2d/pool_e90766ee-c84d-4d19-82ed-8265f083f2c5 members delete \{ "/net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/1
92.0.2.1%3109:80" \}
tmsh::begin_transaction
tmsh::modify ltm node /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0.2.1%3109 metadata none
tmsh::modify auth partition net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e description \"Updated by AS3 at [clock format [clock seconds] -gmt true -format {%a, %d %b %Y %T %Z}]\"
tmsh::delete ltm pool /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/lb_98778260-f5d6-498d-9228-4612113c3c2d/pool_e90766ee-c84d-4d19-82ed-8265f083f2c5
tmsh::create ltm pool /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/lb_98778260-f5d6-498d-9228-4612113c3c2d/pool_e90766ee-c84d-4d19-82ed-8265f083f2c5 description \"tempest-lb_member_pool3_member-batch-1474553793\
" load-balancing-mode ratio-member members replace-all-with \{ /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0.2.1%3109:80 \{ connection-limit 0 description \"886e829c-5fb6-40bc-9a1b-8a4850d5d91b\" dynamic-ra
tio 1 fqdn \{ autopopulate disabled \} priority-group 2 rate-limit disabled ratio 20 state user-up session user-enabled metadata none \} /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0.2.3%3109:81 \{ connecti
on-limit 0 description \"d0ac1f4c-0e75-4852-b9b8-6d2dd2d211d1\" dynamic-ratio 1 fqdn \{ autopopulate disabled \} priority-group 1 rate-limit disabled ratio 20 state user-up session user-enabled metadata none
\} \} min-active-members 1 reselect-tries 0 service-down-action none slow-ramp-time 10 allow-nat yes allow-snat yes metadata none
tmsh::modify ltm pool /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/lb_98778260-f5d6-498d-9228-4612113c3c2d/pool_e90766ee-c84d-4d19-82ed-8265f083f2c5 members add \{ /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0
.2.3%3109:81 \{ connection-limit 0 description \"d0ac1f4c-0e75-4852-b9b8-6d2dd2d211d1\" dynamic-ratio 1 fqdn \{ autopopulate disabled \} priority-group 1 rate-limit disabled ratio 20 state user-up session use
r-enabled metadata none \} \}
tmsh::create ltm node /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0.2.3%3109 address 192.0.2.3%3109 metadata none
tmsh::commit_transaction
} err] } {
catch { tmsh::cancel_transaction } e
regsub -all {"} $err {\"} err
tmsh::modify ltm data-group internal __appsvcs_update records add \{ error \{ data \"$err\" \} \}
tmsh::modify ltm pool /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/lb_98778260-f5d6-498d-9228-4612113c3c2d/pool_e90766ee-c84d-4d19-82ed-8265f083f2c5 members add \{ /net_ef6f031e_9a7c_408c_8ca8_5fd5e7b1535e/192.0
.2.1%3109:80 \{ connection-limit 0 description \"886e829c-5fb6-40bc-9a1b-8a4850d5d91b\" dynamic-ratio 1 fqdn \{ autopopulate disabled \} priority-group 0 rate-limit disabled ratio 20 state user-up session use
r-enabled metadata none \} \}
}}
    total-signing-status not-all-signed
}
mbreton1 commented 1 week ago

I have the same issue on 3.50.2. Any update please? Is there a workaround in the mean time?