canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
102 stars 49 forks source link

503: Service Unavailable: from OIDC Gatekeeper #370

Closed edhenry closed 3 years ago

edhenry commented 3 years ago

I'm attempting to deploy bundle-kubeflow on a baremetal k8s cluster provisioned and deployed by MAAS and juju. The base OS is Ubuntu 18.04, CDK and Ceph are both deployed on the baremetal.

The installation seems to go smooth in that all applications are deployed from the juju perspective, shown below :

snuc@snuc-desktop:~$ juju status
Model     Controller        Cloud/Region  Version  SLA          Timestamp
kubeflow  myk8s-controller  myk8s         2.9.2    unsupported  20:40:33-05:00

App                        Version                    Status  Scale  Charm                 Store       Channel  Rev  OS          Address         Message
admission-webhook          res:oci-image@1abb127      active      1  admission-webhook     charmstore  stable    10  kubernetes  10.152.183.210  
argo-controller            res:oci-image@c1746ae      active      1  argo-controller       charmstore  stable    51  kubernetes                  
dex-auth                   res:oci-image@af9c1b3      active      1  dex-auth              charmstore  stable    60  kubernetes  10.152.183.8    
istio-ingressgateway       res:oci-image@89b5fe2      active      1  istio-ingressgateway  charmstore  stable    20  kubernetes  10.246.72.181   
istio-pilot                res:oci-image@e3e03b3      active      1  istio-pilot           charmstore  stable    20  kubernetes  10.152.183.176  
jupyter-controller         res:oci-image@b2db73b      active      1  jupyter-controller    charmstore  stable    55  kubernetes                  
jupyter-ui                 res:oci-image@3a09e8a      active      1  jupyter-ui            charmstore  stable     9  kubernetes  10.152.183.53   
katib-controller           res:oci-image@3b41adc      active      1  katib-controller      charmstore  stable    30  kubernetes  10.152.183.226  
katib-db                   mariadb/server:10.3        active      1  mariadb-k8s           charmstore  stable    35  kubernetes  10.152.183.135  
katib-db-manager           res:oci-image@c4718dc      active      1  katib-db-manager      charmstore  stable     4  kubernetes  10.152.183.95   
katib-ui                   res:oci-image@a51e0c9      active      1  katib-ui              charmstore  stable    30  kubernetes  10.152.183.132  
kfp-api                    res:oci-image@8e60840      active      1  kfp-api               charmstore  stable    10  kubernetes  10.152.183.250  
kfp-db                     mariadb/server:10.3        active      1  mariadb-k8s           charmstore  stable    35  kubernetes  10.152.183.238  
kfp-persistence            res:oci-image@9338d08      active      1  kfp-persistence       charmstore  stable     7  kubernetes                  
kfp-schedwf                res:oci-image@4ab6488      active      1  kfp-schedwf           charmstore  stable     7  kubernetes                  
kfp-ui                     res:oci-image@04a4348      active      1  kfp-ui                charmstore  stable     9  kubernetes  10.152.183.96   
kfp-viewer                 res:oci-image@bae62bf      active      1  kfp-viewer            charmstore  stable     7  kubernetes                  
kfp-viz                    res:oci-image@c90a581      active      1  kfp-viz               charmstore  stable     6  kubernetes  10.152.183.157  
kubeflow-dashboard         res:oci-image@126c9a9      active      1  kubeflow-dashboard    charmstore  stable    56  kubernetes  10.152.183.125  
kubeflow-profiles          res:profile-image@582b8eb  active      1  kubeflow-profiles     charmstore  stable    52  kubernetes  10.152.183.155  
minio                      res:oci-image@4707912      active      1  minio                 charmstore  stable    55  kubernetes  10.152.183.34   
mlmd                       res:oci-image@78eb66d      active      1  mlmd                  charmstore  stable     5  kubernetes  10.152.183.245  
oidc-gatekeeper            res:oci-image@9bb01f7      active      1  oidc-gatekeeper       charmstore  stable    53  kubernetes  10.152.183.222  
pytorch-operator           res:oci-image@08c3373      active      1  pytorch-operator      charmstore  stable    53  kubernetes                  
seldon-controller-manager  res:oci-image@82fd029      active      1  seldon-core           charmstore  stable    50  kubernetes  10.152.183.246  
spark                      res:oci-image@d792172      active      1  spark                 charmstore  stable     2  kubernetes  10.152.183.122  
tfjob-operator             res:oci-image@3fabaf3      active      1  tfjob-operator        charmstore  stable     1  kubernetes                  

Unit                          Workload  Agent  Address      Ports                                                                                                  Message
admission-webhook/0*          active    idle   10.1.93.106  443/TCP                                                                                                
argo-controller/0*            active    idle   10.1.82.92                                                                                                          
dex-auth/4*                   active    idle   10.1.93.120  5556/TCP                                                                                               
istio-ingressgateway/0*       active    idle   10.1.93.118  15020/TCP,80/TCP,443/TCP,15029/TCP,15030/TCP,15031/TCP,15032/TCP,15443/TCP,15011/TCP,8060/TCP,853/TCP  
istio-pilot/0*                active    idle   10.1.46.90   8080/TCP,15010/TCP,15012/TCP,15017/TCP                                                                 
jupyter-controller/0*         active    idle   10.1.94.64                                                                                                          
jupyter-ui/0*                 active    idle   10.1.46.91   5000/TCP                                                                                               
katib-controller/0*           active    idle   10.1.94.65   443/TCP,8080/TCP                                                                                       
katib-db-manager/0*           active    idle   10.1.69.59   6789/TCP                                                                                               
katib-db/0*                   active    idle   10.1.93.110  3306/TCP                                                                                               ready
katib-ui/0*                   active    idle   10.1.46.92   8080/TCP                                                                                               
kfp-api/0*                    active    idle   10.1.82.94   8888/TCP,8887/TCP                                                                                      
kfp-db/0*                     active    idle   10.1.82.89   3306/TCP                                                                                               ready
kfp-persistence/0*            active    idle   10.1.82.91                                                                                                          
kfp-schedwf/0*                active    idle   10.1.69.64                                                                                                          
kfp-ui/0*                     active    idle   10.1.94.71   3000/TCP                                                                                               
kfp-viewer/0*                 active    idle   10.1.35.67                                                                                                          
kfp-viz/0*                    active    idle   10.1.93.115  8888/TCP                                                                                               
kubeflow-dashboard/0*         active    idle   10.1.46.94   8082/TCP                                                                                               
kubeflow-profiles/0*          active    idle   10.1.69.61   8080/TCP,8081/TCP                                                                                      
minio/0*                      active    idle   10.1.82.93   9000/TCP                                                                                               
mlmd/0*                       active    idle   10.1.69.65   8080/TCP                                                                                               
oidc-gatekeeper/1*            active    idle   10.1.35.71   8080/TCP                                                                                               
pytorch-operator/0*           active    idle   10.1.94.70   8443/TCP                                                                                               
seldon-controller-manager/0*  active    idle   10.1.93.112  8080/TCP,4443/TCP                                                                                      
spark/0*                      active    idle   10.1.93.114  10254/TCP,443/TCP                                                                                      
tfjob-operator/0*             active    idle   10.1.93.116  8443/TCP                         

I did have to apply a patch for that doesn't seem to be found in the documentation hosted on the https://charmed-kubeflow.io/docs/install website. I was able to track the patch down through viewing the video at the bottom of the installation documentation, though. The command is below.

kubectl patch role -n kubeflow istio-ingressgateway-operator -p '{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"name":"istio-ingressgateway-operator"},"rules":[{"apiGroups":["*"],"resources":["*"],"verbs":["*"]}]}'

After applying the patch the istio-gateway comes up and is assigned an IP by MetalLB and everything looks, on the surface to be functional from juju's perspective.

After assigning the proper configurations as shown in the installation documentation for the public-url's for dex and oidc-gatekeeper I still was unable to browse to the URL. I was meet with a 403 shown below.

image

Upon further inspection of the dex-auth and oidc-gatekeeper pod logs, everything looks functional wrt dex-auth but there are 503 errors being thrown by the OIDC gatekeeper. Shown below.

time="2021-06-12T01:38:58Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:38:58Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-12T01:39:08Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:39:08Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-12T01:39:18Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:39:18Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-12T01:39:28Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:39:28Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-12T01:39:38Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:39:38Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-12T01:39:48Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.118 request=/dex/.well-known/openid-configuration
time="2021-06-12T01:39:48Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "

I've also tried destroying the model completely and redeploying but the problem persists.

Has anyone seen this issue before?

DomFleischmann commented 3 years ago

The revision of the oidc-gatekeeper charm seems to be out of date this should be fixed with revision 54. Please try deploying again with the latest bundle or execute juju upgrade-charm oidc-gatekeeper

edhenry commented 3 years ago

Hi @DomFleischmann - thank you for following up!

I've tried redeploying using the latest bundle but it seems the issue still persists.

oidc-gateway logs

time="2021-06-25T23:40:48Z" level=info msg="Config: &{ProviderURL:http://kf01.octo.dell.com/dex ClientID:authservice-oidc ClientSecret:7CHHYCCCV75SS3TKEQHZHCFIR0RRHO OIDCAuthURL: RedirectURL:/authservice/oidc/callback OIDCScopes:[openid profile email groups] StrictSessionValidation:false AuthserviceURLPrefix:/authservice/ SkipAuthURLs:[/authservice/ /dex/] AuthHeader:Authorization HomepageURL:/authservice/site/homepage AfterLoginURL: AfterLogoutURL:/authservice/site/after_logout UserIDHeader:kubeflow-userid UserIDPrefix: UserIDClaim:email UserIDTokenHeader: Hostname: Port:8080 WebServerPort:8082 ReadinessProbePort:8081 CABundlePath: SessionStorePath:bolt.db SessionMaxAge:86400 ClientName:AuthService ThemesURL:themes Theme:kubeflow TemplatePath:[web/templates/default] UserTemplateContext:map[]}"
time="2021-06-25T23:40:48Z" level=info msg="Starting readiness probe at 8081"
time="2021-06-25T23:40:48Z" level=info msg="Starting server at :8080"
time="2021-06-25T23:40:48Z" level=info msg="Starting web server at :8082"
time="2021-06-25T23:40:49Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 403 Forbidden: "
time="2021-06-25T23:40:59Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.193 request=/dex/.well-known/openid-configuration
time="2021-06-25T23:40:59Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-25T23:41:09Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.193 request=/dex/.well-known/openid-configuration
time="2021-06-25T23:41:09Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-25T23:41:19Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.193 request=/dex/.well-known/openid-configuration
time="2021-06-25T23:41:19Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "
time="2021-06-25T23:41:29Z" level=info msg="URI is whitelisted. Accepted without authorization." ip=10.1.93.193 request=/dex/.well-known/openid-configuration
time="2021-06-25T23:41:29Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 503 Service Unavailable: "

juju status :

Model     Controller        Cloud/Region  Version  SLA          Timestamp
kubeflow  myk8s-controller  myk8s         2.9.2    unsupported  18:47:36-05:00

App                        Version                    Status  Scale  Charm                 Store       Channel  Rev  OS          Address         Message
admission-webhook          res:oci-image@1abb127      active      1  admission-webhook     charmstore  stable    10  kubernetes  10.152.183.85   
argo-controller            res:oci-image@c1746ae      active      1  argo-controller       charmstore  stable    51  kubernetes                  
dex-auth                   res:oci-image@af9c1b3      active      1  dex-auth              charmstore  stable    60  kubernetes  10.152.183.13   
istio-ingressgateway       res:oci-image@89b5fe2      active      1  istio-ingressgateway  charmstore  stable    20  kubernetes  10.246.72.181   
istio-pilot                res:oci-image@e3e03b3      active      1  istio-pilot           charmstore  stable    20  kubernetes  10.152.183.237  
jupyter-controller         res:oci-image@b2db73b      active      1  jupyter-controller    charmstore  stable    55  kubernetes                  
jupyter-ui                 res:oci-image@3a09e8a      active      1  jupyter-ui            charmstore  stable     9  kubernetes  10.152.183.140  
katib-controller           res:oci-image@3b41adc      active      1  katib-controller      charmstore  stable    30  kubernetes  10.152.183.11   
katib-db                   mariadb/server:10.3        active      1  mariadb-k8s           charmstore  stable    35  kubernetes  10.152.183.69   
katib-db-manager           res:oci-image@c4718dc      active      1  katib-db-manager      charmstore  stable     4  kubernetes  10.152.183.224  
katib-ui                   res:oci-image@a51e0c9      active      1  katib-ui              charmstore  stable    30  kubernetes  10.152.183.102  
kfp-api                    res:oci-image@8e60840      active      1  kfp-api               charmstore  stable    10  kubernetes  10.152.183.81   
kfp-db                     mariadb/server:10.3        active      1  mariadb-k8s           charmstore  stable    35  kubernetes  10.152.183.110  
kfp-persistence            res:oci-image@9338d08      active      1  kfp-persistence       charmstore  stable     7  kubernetes                  
kfp-schedwf                res:oci-image@4ab6488      active      1  kfp-schedwf           charmstore  stable     7  kubernetes                  
kfp-ui                     res:oci-image@04a4348      active      1  kfp-ui                charmstore  stable    10  kubernetes  10.152.183.56   
kfp-viewer                 res:oci-image@bae62bf      active      1  kfp-viewer            charmstore  stable     7  kubernetes                  
kfp-viz                    res:oci-image@c90a581      active      1  kfp-viz               charmstore  stable     6  kubernetes  10.152.183.77   
kubeflow-dashboard         res:oci-image@126c9a9      active      1  kubeflow-dashboard    charmstore  stable    56  kubernetes  10.152.183.79   
kubeflow-profiles          res:profile-image@582b8eb  active      1  kubeflow-profiles     charmstore  stable    52  kubernetes  10.152.183.90   
minio                      res:oci-image@4707912      active      1  minio                 charmstore  stable    55  kubernetes  10.152.183.150  
mlmd                       res:oci-image@78eb66d      active      1  mlmd                  charmstore  stable     5  kubernetes  10.152.183.6    
oidc-gatekeeper            res:oci-image@9bb01f7      active      1  oidc-gatekeeper       charmstore  stable    54  kubernetes  10.152.183.190  
pytorch-operator           res:oci-image@08c3373      active      1  pytorch-operator      charmstore  stable    53  kubernetes                  
seldon-controller-manager  res:oci-image@82fd029      active      1  seldon-core           charmstore  stable    50  kubernetes  10.152.183.205  
spark                      res:oci-image@d792172      active      1  spark                 charmstore  stable     2  kubernetes  10.152.183.52   
tfjob-operator             res:oci-image@3fabaf3      active      1  tfjob-operator        charmstore  stable     1  kubernetes                  

Unit                          Workload  Agent  Address      Ports                                                                                                  Message
admission-webhook/0*          active    idle   10.1.93.180  443/TCP                                                                                                
argo-controller/0*            active    idle   10.1.69.108                                                                                                         
dex-auth/6*                   active    idle   10.1.46.148  5556/TCP                                                                                               
istio-ingressgateway/0*       active    idle   10.1.93.193  15020/TCP,80/TCP,443/TCP,15029/TCP,15030/TCP,15031/TCP,15032/TCP,15443/TCP,15011/TCP,8060/TCP,853/TCP  
istio-pilot/1*                active    idle   10.1.35.100  8080/TCP,15010/TCP,15012/TCP,15017/TCP                                                                 
jupyter-controller/1*         active    idle   10.1.94.121                                                                                                         
jupyter-ui/0*                 active    idle   10.1.46.137  5000/TCP                                                                                               
katib-controller/0*           active    idle   10.1.82.139  443/TCP,8080/TCP                                                                                       
katib-db-manager/0*           active    idle   10.1.93.191  6789/TCP                                                                                               
katib-db/0*                   active    idle   10.1.93.190  3306/TCP                                                                                               ready
katib-ui/0*                   active    idle   10.1.35.98   8080/TCP                                                                                               
kfp-api/0*                    active    idle   10.1.93.192  8888/TCP,8887/TCP                                                                                      
kfp-db/0*                     active    idle   10.1.82.141  3306/TCP                                                                                               ready
kfp-persistence/0*            active    idle   10.1.82.138                                                                                                         
kfp-schedwf/0*                active    idle   10.1.93.187                                                                                                         
kfp-ui/0*                     active    idle   10.1.93.186  3000/TCP                                                                                               
kfp-viewer/0*                 active    idle   10.1.82.137                                                                                                         
kfp-viz/0*                    active    idle   10.1.93.184  8888/TCP                                                                                               
kubeflow-dashboard/0*         active    idle   10.1.46.143  8082/TCP                                                                                               
kubeflow-profiles/0*          active    idle   10.1.94.118  8080/TCP,8081/TCP                                                                                      
minio/0*                      active    idle   10.1.82.142  9000/TCP                                                                                               
mlmd/0*                       active    idle   10.1.46.145  8080/TCP                                                                                               
oidc-gatekeeper/2*            active    idle   10.1.46.147  8080/TCP                                                                                               
pytorch-operator/0*           active    idle   10.1.94.119  8443/TCP                                                                                               
seldon-controller-manager/0*  active    idle   10.1.69.111  8080/TCP,4443/TCP                                                                                      
spark/0*                      active    idle   10.1.93.185  10254/TCP,443/TCP                                                                                      
tfjob-operator/0*             active    idle   10.1.46.141  8443/TCP   

Any ideas?