chaostoolkit-incubator / chaostoolkit-azure

Chaos Toolkit Extension for Azure
https://chaostoolkit.org/
Apache License 2.0
22 stars 28 forks source link

Does chaosazure.vmss.actions/deallocate-vmss with environment variables work? #69

Closed torumakabe closed 4 years ago

torumakabe commented 5 years ago

chaosazure.vmss.actions/deallocate-vmss with injection the secrets explicitly works, but passing environment variables does not.

The following experiment(injection the secrets explicitly) works fine.

{
<<<<snip>>>>
    "configuration": {
        "azure": {
            "subscription_id": "xxxx"
        },
        "service_url": {
            "type": "env",
            "key": "APPLICATION_ENTRYPOINT_URL"
        }
    },
    "secrets": {
        "azure": {
            "client_id": "xxxx",
            "client_secret": "xxxx",
            "tenant_id": "xxxx"
        }
    },
<<<<snip>>>>
    "method": [
        {
            "type": "action",
            "name": "deallocate-vmss",
            "provider": {
                "module": "chaosazure.vmss.actions",
                "type": "python",
                "func": "deallocate_vmss",
                "secrets": [
                    "azure"
                ],
                "config": [
                    "azure"
                ],
                "arguments": {
                    "filter": ""
                }
            },
            "pauses": {
                "after": 2
            }
        }
    ]
}
[2019-07-26 23:40:45 INFO] Validating the experiment's syntax
[2019-07-26 23:40:45 INFO] Experiment looks valid
[2019-07-26 23:40:45 INFO] Running experiment: My application is resilient to pod death
[2019-07-26 23:40:45 INFO] Steady state hypothesis: Application is normal
[2019-07-26 23:40:45 INFO] Probe: application-must-respond-normally
[2019-07-26 23:40:45 INFO] Steady state hypothesis is met!
[2019-07-26 23:40:45 INFO] Action: deallocate-vmss
additional_properties is not a known attribute of class <class 'azure.mgmt.resourcegraph.models._models_py3.QueryRequest'> and will be ignored
[2019-07-26 23:40:48 INFO] Pausing after activity for 2s...
[2019-07-26 23:40:50 INFO] Steady state hypothesis: Application is normal
[2019-07-26 23:40:50 INFO] Probe: application-must-respond-normally
[2019-07-26 23:40:50 INFO] Steady state hypothesis is met!
[2019-07-26 23:40:50 INFO] Let's rollback...
[2019-07-26 23:40:50 INFO] No declared rollbacks, let's move on.
[2019-07-26 23:40:50 INFO] Experiment ended with status: completed

But the following one (environment variables) fails.

{
<<<<snip>>>>
    "configuration": {
        "azure": {
            "subscription_id": {
                "type": "env",
                "key": "AZURE_SUBSCRIPTION_ID"
            }
        },
        "service_url": {
            "type": "env",
            "key": "APPLICATION_ENTRYPOINT_URL"
        }
    },
    "secrets": {
        "azure": {
            "client_id": {
                "type": "env",
                "key": "AZURE_CLIENT_ID"
            },
            "client_secret": {
                "type": "env",
                "key": "AZURE_CLIENT_SECRET"
            },
            "tenant_id": {
                "type": "env",
                "key": "AZURE_TENANT_ID"
            }
        }
    },
<<<<snip>>>>
    "method": [
        {
            "type": "action",
            "name": "deallocate-vmss",
            "provider": {
                "module": "chaosazure.vmss.actions",
                "type": "python",
                "func": "deallocate_vmss",
                "secrets": [
                    "azure"
                ],
                "config": [
                    "azure"
                ],
                "arguments": {
                    "filter": ""
                }
            },
            "pauses": {
                "after": 2
            }
        }
    ]
}
[2019-07-26 23:57:27 INFO] Validating the experiment's syntax
[2019-07-26 23:57:27 INFO] Experiment looks valid
[2019-07-26 23:57:27 INFO] Running experiment: My application is resilient to node death
[2019-07-26 23:57:27 INFO] Steady state hypothesis: Application is normal
[2019-07-26 23:57:27 INFO] Probe: application-must-respond-normally
[2019-07-26 23:57:27 INFO] Steady state hypothesis is met!
[2019-07-26 23:57:27 INFO] Action: deallocate-vmss
additional_properties is not a known attribute of class <class 'azure.mgmt.resourcegraph.models._models_py3.QueryRequest'> and will be ignored
[2019-07-26 23:57:28 ERROR]   => failed: azure.mgmt.resourcegraph.models._models_py3.ErrorResponseException: (BadRequest) Please provide below info when asking for support: timestamp = 2019-07-26T14:57:28.0913916Z, correlationId = 974dabf8-0163-4637-bac5-f63d65f71318.
[2019-07-26 23:57:28 INFO] Pausing after activity for 2s...
[2019-07-26 23:57:30 INFO] Steady state hypothesis: Application is normal
[2019-07-26 23:57:30 INFO] Probe: application-must-respond-normally
[2019-07-26 23:57:30 INFO] Steady state hypothesis is met!
[2019-07-26 23:57:30 INFO] Let's rollback...
[2019-07-26 23:57:30 INFO] No declared rollbacks, let's move on.
[2019-07-26 23:57:30 INFO] Experiment ended with status: completed

I would appreciate it if you would give me some advice.

Lawouach commented 5 years ago

I'm surprised either works to be fair.

This is a common pitfall (we are considering changing that in the toolkit core), configuration are flat while secrets are scoped. The rationale was, because the latter are sensitive, the probe/action must bne explicit about which one it wants to access. Config keys are not sensitive so they are always passed as fully:


{
<<<<snip>>>>
    "configuration": {
        "subscription_id": {
                "type": "env",
                "key": "AZURE_SUBSCRIPTION_ID"
            },
        "service_url": {
            "type": "env",
            "key": "APPLICATION_ENTRYPOINT_URL"
        }
    },
    "secrets": {
        "azure": {
            "client_id": {
                "type": "env",
                "key": "AZURE_CLIENT_ID"
            },
            "client_secret": {
                "type": "env",
                "key": "AZURE_CLIENT_SECRET"
            },
            "tenant_id": {
                "type": "env",
                "key": "AZURE_TENANT_ID"
            }
        }
    },
<<<<snip>>>>
    "method": [
        {
            "type": "action",
            "name": "deallocate-vmss",
            "provider": {
                "module": "chaosazure.vmss.actions",
                "type": "python",
                "func": "deallocate_vmss",
                "secrets": [
                    "azure"
                ],
                "arguments": {
                    "filter": ""
                }
            },
            "pauses": {
                "after": 2
            }
        }
    ]
}```
torumakabe commented 5 years ago

@Lawouach Thank you for your help. I tried making config flat and removing config from action, but it triggered another error. The following is debug messages.

[36m[2019-07-27 07:23:17 DEBUG] Start deallocate_vmss: configuration='{'subscription_id': 'my-id', 'service_url': 'http://20.43.95.43'}', filter=''
[36m[2019-07-27 07:23:18 DEBUG] Activity failed
    Traceback (most recent call last):
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 55, in run_python_activity
        return func(**arguments)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 129, in deallocate_vmss
        vmss = choose_vmss_at_random(filter, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 159, in choose_vmss_at_random
        vmss = fetch_resources(filter, RES_TYPE_VMSS, secrets, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 11, in fetch_resources
        query, resource_type, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 20, in __create_resource_graph_query
        subscription_id = configuration['azure']['subscription_id']
    KeyError: 'azure'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/activity.py", line 219, in run_activity
        result = run_python_activity(activity, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 60, in run_python_activity
        sys.exc_info()[2])
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 55, in run_python_activity
        return func(**arguments)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 129, in deallocate_vmss
        vmss = choose_vmss_at_random(filter, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 159, in choose_vmss_at_random
        vmss = fetch_resources(filter, RES_TYPE_VMSS, secrets, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 11, in fetch_resources
        query, resource_type, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 20, in __create_resource_graph_query
        subscription_id = configuration['azure']['subscription_id']
    chaoslib.exceptions.ActivityFailed: KeyError: 'azure'
[2019-07-27 07:23:18 ERROR]   => failed: KeyError: 'azure'

Maybe it needs the key 'azure', so I put it.

<<<<snip>>>>
    "configuration": {
        "azure": {
            "subscription_id": {
                "type": "env",
                "key": "AZURE_SUBSCRIPTION_ID"
            }
        },
<<<<snip>>>>

But I got an error the same as the original of this issue. It seems that the environment variable was not expanded.

[2019-07-27 07:35:24 DEBUG] Start deallocate_vmss: configuration='{'azure': {'subscription_id': {'type': 'env', 'key': 'AZURE_SUBSCRIPTION_ID'}}, 'service_url': 'http://20.43.95.43'}', filter=''
[2019-07-27 07:35:25 DEBUG] Activity failed
    Traceback (most recent call last):
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 55, in run_python_activity
        return func(**arguments)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 129, in deallocate_vmss
        vmss = choose_vmss_at_random(filter, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 159, in choose_vmss_at_random
        vmss = fetch_resources(filter, RES_TYPE_VMSS, secrets, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 13, in fetch_resources
        resources = client.resources(query)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/azure/mgmt/resourcegraph/operations/_resource_graph_client_operations.py", line 64, in resources
        raise models.ErrorResponseException(self._deserialize, response)
    azure.mgmt.resourcegraph.models._models_py3.ErrorResponseException: (BadRequest) Please provide below info when asking for support: timestamp = 2019-07-26T22:35:25.3237547Z, correlationId = 2f4f72d1-df38-4e35-91d4-055c40d10f57.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/activity.py", line 219, in run_activity
        result = run_python_activity(activity, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 60, in run_python_activity
        sys.exc_info()[2])
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaoslib/provider/python.py", line 55, in run_python_activity
        return func(**arguments)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 129, in deallocate_vmss
        vmss = choose_vmss_at_random(filter, configuration, secrets)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/vmss/actions.py", line 159, in choose_vmss_at_random
        vmss = fetch_resources(filter, RES_TYPE_VMSS, secrets, configuration)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/chaosazure/rgraph/resource_graph.py", line 13, in fetch_resources
        resources = client.resources(query)
      File "/Users/tomakabe/.venvs/chaostk/lib/python3.7/site-packages/azure/mgmt/resourcegraph/operations/_resource_graph_client_operations.py", line 64, in resources
        raise models.ErrorResponseException(self._deserialize, response)
    chaoslib.exceptions.ActivityFailed: azure.mgmt.resourcegraph.models._models_py3.ErrorResponseException: (BadRequest) Please provide below info when asking for support: timestamp = 2019-07-26T22:35:25.3237547Z, correlationId = 2f4f72d1-df38-4e35-91d4-055c40d10f57.
[2019-07-27 07:35:25 ERROR]   => failed: azure.mgmt.resourcegraph.models._models_py3.ErrorResponseException: (BadRequest) Please provide below info when asking for support: timestamp = 2019-07-26T22:35:25.3237547Z, correlationId = 2f4f72d1-df38-4e35-91d4-055c40d10f57.

For confirmation, I tried hardcoded config.

<<<<snip>>>>
    "configuration": {
        "azure": {
            "subscription_id": "my-id"
        },
<<<<snip>>>>

It worked.

[2019-07-27 07:40:02 DEBUG] Start deallocate_vmss: configuration='{'azure': {'subscription_id': 'my-id'}, 'service_url': 'http://20.43.95.43'}', filter=''
[2019-07-27 07:40:03 DEBUG] Found virtual machine scale sets: ['aks-pool1-27450415-vmss']
[2019-07-27 07:40:05 DEBUG] Found virtual machine scale set instances: ['aks-pool1-27450415-vmss_0', 'aks-pool1-27450415-vmss_1', 'aks-pool1-27450415-vmss_2']
[2019-07-27 07:40:05 DEBUG] Deallocating instance: aks-pool1-27450415-vmss_2
[2019-07-27 07:40:06 DEBUG]   => succeeded without any result value

I would appreciate it if you would give me some advice.

bugra-derre commented 4 years ago

@ToruMakabe I guess it should work. At least I tried it with this configuration - even if it's yaml-based you can easily change this to json.

version: "1.0.0"
title: "Check resiliency"
description: "Restart random node"
configuration:
  azure_subscription_id:
    type: env
    key: AZ_SUBSCRIPTION_ID
secrets:
  azure:
    client_id:
      type: env
      key: AZ_CLIENT_ID
    client_secret:
      type: env
      key: AZ_CLIENT_SECRET
    tenant_id:
      type: env
      key: AZ_TENANT_ID
steady-state-hypothesis:
...
torumakabe commented 4 years ago

@bugra-derre Thanks! It works now and I found this fix in the following commit https://github.com/chaostoolkit-incubator/chaostoolkit-azure/pull/92