Closed jlewi closed 4 years ago
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/bug | 0.91 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
platform/gcp | 0.57 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
platform/gcp | 0.57 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
platform/gcp | 0.57 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
platform/gcp | 0.57 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
@jlewi Is this the entrypoint of the test? https://github.com/kubeflow/kfctl/blob/7bfe692bdfb42002073c0ea196c5942e606ed48c/py/kubeflow/kfctl/testing/pytests/kf_is_ready_test.py#L96
@Bobgy yes; if you set your current kubectl context to point at a cluster you should be able to run that test locally which may help debug it.
@jlewi I looked at auto deployed clusters and found the root cause is: persistence disks were created in us-central1-f zone, while nodes were in us-central1-{a,b,c} zones, so PVs cannot be mounted to pods.
Do you have any ideas why they are deployed to different zones?
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
area/kfctl | 0.72 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
@Bobgy Its because we are using a regional cluster so it has nodes it multiple zones. It looks like kubeflow/gcp-blueprints#6 we still have some work to do to make this work with regional clusters.
The short term fix would be to change the autodeployments to use a zonal cluster. I think we could change the defaults here https://github.com/kubeflow/testing/blob/637d1cd5fe33d03ee5646380d960fabe8a230d0a/py/kubeflow/testing/create_kf_from_gcp_blueprint.py#L69
I don't think the blueprint reconciler is overwriting the defaults
@Bobgy Would you mind submitting a PR to try to change the auto-deployer to use a zonal cluster?
It's been Chinese holidays for three days, I will return to work on Sunday (a working day).
@Bobgy thanks for the heads up; I've been OOO as well; enjoy the holiday; we can fix this next week.
e
Lets not close this until we have a passing green.
@jlewi The test is still failing, I looked at the log and found that the test script no longer matches current deployment name. https://github.com/kubeflow/kfctl/blob/baf59c2692f45847bbd042c78a751a761b2b7eaa/py/kubeflow/kfctl/testing/pytests/kf_is_ready_test.py#L96-L106
ml-pipeline-viewer-controller-deployment
is now called ml-pipeline-viewer-crd
. Can I go ahead and update that test? Why is it in kfctl repo? Will there be backward compatibility concerns?
Also we should probably update that list with new deployments in KFP service.
@jlewi friendly ping
@Bobgy Yes please go ahead and update the tests is necessary to work with the current version of KFP. You can consider the code location to be a historical accident. So feel free to move the pipelines code somewhere else if it makes more sense.
I'm not very familiar with how to move tests so I'm going to send a PR to fix tests first.
@Bobgy moving the tests just means
Updating the tekton task might mean
let me leave the issue open to try moving that python code. This seems a good issue for me to get better idea of the test infra.
actually, I've tracked the task in https://github.com/kubeflow/pipelines/projects/5.
I think we can close this. /close
@Bobgy: Closing this issue.
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-master-periodic
@Bobgy could you please take a look? It could be the case that the test needs to be updated.