chaostoolkit-incubator / chaostoolkit-azure

Chaos Toolkit Extension for Azure
https://chaostoolkit.org/
Apache License 2.0
22 stars 28 forks source link

Azure AKS actions display 'No AKS Cluster found' or 'No virtual machines' #117

Open nikola1011 opened 4 years ago

nikola1011 commented 4 years ago

Hello,

I am trying to perform a chaostoolkit experiment on a Azure Kubernetes Service, but it seems that my chaostoolkit-azure extension does not see my running cluster. Credentials used to connect to Azure are generated with az ad sp create-for-rbac --sdk-auth > credentials.json command (as specified in documentation). Cluster is running and available (obvious from Azure portal).

Attached are two experiment files, one using credentials file specified by AZURE_AUTH_LOCATION env, and the other one with credentials (secrets and configuration) placed directly inside the experiment file (real values replaced with 'xxx'). env-experiment.txt secrets-experiment.txt

Experiment files both generate the same output (therefore I don't think it's the credentials problem):

[2020-06-23 14:37:19 INFO] Validating the experiment's syntax
[2020-06-23 14:37:19 INFO] Experiment looks valid
[2020-06-23 14:37:19 INFO] Running experiment: ...
[2020-06-23 14:37:19 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:37:19 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:37:19 INFO] Steady state hypothesis is met!
[2020-06-23 14:37:19 INFO] Action: restart-aks-node-at-random
[2020-06-23 14:37:21 WARNING] No virtual machines found
[2020-06-23 14:37:21 ERROR]   => failed: chaoslib.exceptions.ActivityFailed: No virtual machines found
[2020-06-23 14:37:21 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:37:21 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:37:21 INFO] Steady state hypothesis is met!
[2020-06-23 14:37:21 INFO] Let's rollback...
[2020-06-23 14:37:21 INFO] No declared rollbacks, let's move on.
[2020-06-23 14:37:21 INFO] Experiment ended with status: completed

Even if I add the 'filter' parameter "filter": "where resourceGroup=='myResourceGroup' and name=='myFlaskCluster'" to the 'restart_node' function the output simply changes the error message (from No virtual machines found to No AKS clusters found):

[2020-06-23 14:51:21 INFO] Validating the experiment's syntax
[2020-06-23 14:51:21 INFO] Experiment looks valid
[2020-06-23 14:51:21 INFO] Running experiment: ...
[2020-06-23 14:51:21 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:51:21 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:51:21 INFO] Steady state hypothesis is met!
[2020-06-23 14:51:21 INFO] Action: restart-aks-node-at-random
[2020-06-23 14:51:23 WARNING] No AKS clusters found
[2020-06-23 14:51:23 ERROR]   => failed: chaoslib.exceptions.ActivityFailed: No AKS clusters found
[2020-06-23 14:51:23 INFO] Steady state hypothesis: Services are all available and healthy
[2020-06-23 14:51:23 INFO] Probe: consumer-service-must-still-respond
[2020-06-23 14:51:23 INFO] Steady state hypothesis is met!
[2020-06-23 14:51:23 INFO] Let's rollback...
[2020-06-23 14:51:23 INFO] No declared rollbacks, let's move on.
[2020-06-23 14:51:23 INFO] Experiment ended with status: completed

Would please be kind to check if I am missing something obvious or if this is an issue ?

nikola1011 commented 4 years ago

I have checked chaostoolkit.log file and verified that configuration parameters (azure_subscription_id and filter parameter) are correctly passed to the restart_node function and thus to fetch_resources function.

nikola1011 commented 4 years ago

Finally, these are the versions that I am using (latest releases, if I am not mistaken):

NAME                VERSION   
CLI                 1.4.2     
Core library        1.10.0  
NAME                                    VERSION   LICENSE                       DESCRIPTION                                       
chaostoolkit-azure                      0.8.3     Apache License Version 2.0    Microsoft Azure                                   
chaostoolkit-kubernetes                 0.22.0    Apache License Version 2.0    Kubernetes   
PranayWankhede commented 3 years ago

@nikola1011 @HemantAHK @buderre @xpdable is there any update on this ? I am also facing similar issue. thanks!

nikola1011 commented 3 years ago

@PranayWankhede unfortunately no. I haven't been able to solve it, thus moved my development to a local cluster only. Note that virtual machines are visible to Chaostoolkit, but the actual AKS Cluster is not. Maybe there is a workaround there.