gardener / machine-controller-manager

Declarative way of managing machines for Kubernetes cluster
Apache License 2.0
246 stars 113 forks source link

Smoke tests to cover all providers #128

Closed prashanth26 closed 5 years ago

prashanth26 commented 5 years ago

Objective

To have integration/smoke tests

Current state of smoke tests

We have integration tests written on bash (only for AWS) where we run a set of smoke tests like

  1. Create and delete single machine object
  2. Create a machine-deployment with 3 replicas
  3. Scale it down to 2
  4. Scale it up to 4
  5. Rolling update of machines from v1 to v2

Possible solutions moving forward

We have two options moving forward

  1. Improve smoke tests to check for all providers and K8s versions by provisioning clusters statically (clusters that are pre-created) /dynamically(on-demand)
  2. Mock API calls for different cloud providers to mock providers while running integration tests

Improve smoke tests

Things to consider

2. Mock API calls

prashanth26 commented 5 years ago

After having a follow up discussion on this topic. The following were the notable points in the discussions.

prashanth26 commented 5 years ago

After doing a first round of analysis, here are my findings.

Resource groups

Account/Access-key specific cleanup

amshuman-kr commented 5 years ago

The way I see it, we have three options.

  1. Use labels and hope that label-related bugs don't occur too often. This is not as bad as it sounds :-)
  2. Use something like hoverfly or mock-server to record cloud provider API calls and then figure out what to cleanup from the recorded requests and responses.
  3. Discard integration tests and actually mock the services (perhaps using something like hoverfly or mock-server).

To me, 1 or 3 make sense. 2, not so much. What about everyone else?

prashanth26 commented 5 years ago

After having an follow up offline discussion with @amshuman-kr. The following is our plan,

  1. Go ahead with option(1) where we continue existing method of using labels to tag VMs backing machines and clean up based on the stale machine objects present in the cluster. If clean-up fails (due to errors with improper labelling in the controller code), the integration test would also fail.
  2. Once we have this in place, maybe we can make efforts to go with actual mocking of cloud interfaces as described in option (3).