Group pods under aerogear-metrics application and use deploy instead of deploymentConfig

StevenTobin commented 6 years ago

This PR groups all the metrics pods under the application label aerogear-metrics and switches to the use of k8s deploy resources instead of Openshift deploymentConfigs.

david-martin commented 6 years ago

@matzew would appreciated your review on this given you're also working on https://issues.jboss.org/browse/AEROGEAR-2012

matzew commented 6 years ago

:eyes: at it now

matzew commented 6 years ago

While I see that the deployment part works, I am unable to get the entire APB up and running.

Here is the change from dc to deployment, I could successfully verify that:

➜  metrics-apb git:(3d6ef5e) oc get dc
No resources found.
➜  metrics-apb git:(3d6ef5e) oc get deployment
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
aerogear-app-metrics   1         1         1            1           5m
grafana                1         1         1            1           6m
postgres               1         1         1            1           5m
prometheus             1         1         1            1           7m

However, the apb fails here

See the log:


FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (4 retries left).
--
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (3 retries left).
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (2 retries left).
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (1 retries left).
  | fatal: [localhost]: FAILED! => {"attempts": 30, "changed": false, "content": "", "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 113] No route to host>", "redirected": false, "status": -1, "url": "http://aerogear-app-metrics-foo.192.168.37.1.nip.io/healthz "}
  | to retry, use: --limit @/opt/apb/actions/provision.retry
  | PLAY RECAP *********************************************************************
  | localhost                  : ok=38   changed=35   unreachable=0    failed=1
  | + EXIT_CODE=2
  | + set +ex
  | + '[' -f /var/tmp/test-result ']'
  | + exit 2

This brings me to the question, why we are still using openshift_v1_route entries? We can use k8s_v1_endpoints instead, right ?

StevenTobin commented 6 years ago

That check in the APB for the pods to be ready is heavily dependent on your system and how quick the pods come up. If it takes a while for your system to pull the image the check could time out or if it's under heavy load the check could timeout, but when you check the project in Openshift the pod could be fine seconds after the APB provision has 'failed'.

I've been investigating the CI failures on this APB and it usually seems to be these container checks timing out, I also can't find checks like these in any other APBs and my instinct is to remove them. Without them the APB won't wait for each container to become ready so the whole APB should provision faster, so this is what i'd like to do:

Remove the container ready checks for grafana and metrics-api server as the functionality can be replaced with readinessProbes.
The metrics-api server already has a readinessProbe that hits the healthz endpoint.
Grafana has an /api/health endpoint so add a readinessProbe that hits that endpoint.

I can't find a similar health endpoint for prometheus however.

@philbrookes @david-martin @matzew does this seem reasonable.

I'll create an issue as it's not really in the scope of this PR.

david-martin commented 6 years ago

@StevenTobin Readiness probes for health endpoints sounds reasonable. Checks at the end of the APB provision task to ensure every service is running OK (has at least 1 pod in a ready state) would ensure the APB only finishes if everything is running OK.

matzew commented 6 years ago

@StevenTobin I also see that the deprovision still speaks about deploymentconfig.

Perhaps we wanna use something like this: https://github.com/ansibleplaybookbundle/mediawiki-apb/blob/master/roles/mediawiki/tasks/deprovision.yml#L22-L28

matzew commented 6 years ago

I'll create an issue as it's not really in the scope of this PR.

+1 on moving this larger discussion to a new issue/thread

secondsun commented 6 years ago

I don't know if this is related, but after deploying metrics with this PR I don't see it in my list of services after running mobile get clientconfig myapp-android --namespace=myproject -o json

david-martin commented 6 years ago

@secondsun I think this will address it disappearing from the cli https://issues.jboss.org/browse/AEROGEAR-2267 (cc @pb82)

pb82 commented 6 years ago

@secondsun did this work for you before? Because i think it shouldn't work work.

grdryn commented 5 years ago

Bump!

I just noticed that DeploymentConfigs were being used, and was considering changing to Deployments. Thankfully I noticed this PR first!

aerogearcatalog / metrics-apb

Group pods under aerogear-metrics application and use deploy instead of deploymentConfig #34