Open StevenTobin opened 6 years ago
@matzew would appreciated your review on this given you're also working on https://issues.jboss.org/browse/AEROGEAR-2012
:eyes: at it now
While I see that the deployment
part works, I am unable to get the entire APB up and running.
Here is the change from dc
to deployment
, I could successfully verify that:
➜ metrics-apb git:(3d6ef5e) oc get dc
No resources found.
➜ metrics-apb git:(3d6ef5e) oc get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
aerogear-app-metrics 1 1 1 1 5m
grafana 1 1 1 1 6m
postgres 1 1 1 1 5m
prometheus 1 1 1 1 7m
However, the apb fails here
See the log:
FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (4 retries left).
--
| FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (3 retries left).
| FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (2 retries left).
| FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (1 retries left).
| fatal: [localhost]: FAILED! => {"attempts": 30, "changed": false, "content": "", "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 113] No route to host>", "redirected": false, "status": -1, "url": "http://aerogear-app-metrics-foo.192.168.37.1.nip.io/healthz "}
| to retry, use: --limit @/opt/apb/actions/provision.retry
| PLAY RECAP *********************************************************************
| localhost : ok=38 changed=35 unreachable=0 failed=1
| + EXIT_CODE=2
| + set +ex
| + '[' -f /var/tmp/test-result ']'
| + exit 2
This brings me to the question, why we are still using openshift_v1_route
entries? We can use k8s_v1_endpoints
instead, right ?
That check in the APB for the pods to be ready is heavily dependent on your system and how quick the pods come up. If it takes a while for your system to pull the image the check could time out or if it's under heavy load the check could timeout, but when you check the project in Openshift the pod could be fine seconds after the APB provision has 'failed'.
I've been investigating the CI failures on this APB and it usually seems to be these container checks timing out, I also can't find checks like these in any other APBs and my instinct is to remove them. Without them the APB won't wait for each container to become ready so the whole APB should provision faster, so this is what i'd like to do:
I can't find a similar health endpoint for prometheus however.
@philbrookes @david-martin @matzew does this seem reasonable.
I'll create an issue as it's not really in the scope of this PR.
@StevenTobin Readiness probes for health endpoints sounds reasonable. Checks at the end of the APB provision task to ensure every service is running OK (has at least 1 pod in a ready state) would ensure the APB only finishes if everything is running OK.
@StevenTobin I also see that the deprovision
still speaks about deploymentconfig
.
Perhaps we wanna use something like this: https://github.com/ansibleplaybookbundle/mediawiki-apb/blob/master/roles/mediawiki/tasks/deprovision.yml#L22-L28
I'll create an issue as it's not really in the scope of this PR.
+1 on moving this larger discussion to a new issue/thread
I don't know if this is related, but after deploying metrics with this PR I don't see it in my list of services after running mobile get clientconfig myapp-android --namespace=myproject -o json
@secondsun I think this will address it disappearing from the cli https://issues.jboss.org/browse/AEROGEAR-2267 (cc @pb82)
@secondsun did this work for you before? Because i think it shouldn't work work.
Bump!
I just noticed that DeploymentConfigs were being used, and was considering changing to Deployments. Thankfully I noticed this PR first!
This PR groups all the metrics pods under the application label aerogear-metrics and switches to the use of k8s deploy resources instead of Openshift deploymentConfigs.
Related issues: https://github.com/aerogearcatalog/metrics-apb/issues/30 & https://github.com/aerogearcatalog/metrics-apb/issues/31