Azure / open-service-broker-azure

The Open Service Broker API Server for Azure Services
https://osba.sh
MIT License
248 stars 100 forks source link

osba pod failing to pull image #700

Closed kyschouv closed 5 years ago

kyschouv commented 5 years ago

We're getting a 500 from osbapublicacr.azurecr.io trying to pull the azure-service-broker image. Any idea how we can resolve this? It's been happening since earlier today.

  Type     Reason     Age                From                                        Message
  ----     ------     ----               ----                                        -------
  Normal   Scheduled  36s                default-scheduler                           Successfully assigned osba/osba-open-service-broker-azure-7f59f8c7d7-q684q to k8s-linuxpool-11683214-vmss000001
  Normal   Pulling    22s (x2 over 35s)  kubelet, k8s-linuxpool-11683214-vmss000001  pulling image "osbapublicacr.azurecr.io/microsoft/azure-service-broker:v1.6.0"
  Warning  Failed     21s (x2 over 34s)  kubelet, k8s-linuxpool-11683214-vmss000001  Failed to pull image "osbapublicacr.azurecr.io/microsoft/azure-service-broker:v1.6.0": rpc error: code = Unknown desc = Error response from daemon: Get https://osbapublicacr.azurecr.io/v2/microsoft/azure-service-broker/manifests/v1.6.0: received unexpected HTTP status: 500 Internal Server Error
  Warning  Failed     21s (x2 over 34s)  kubelet, k8s-linuxpool-11683214-vmss000001  Error: ErrImagePull
  Normal   BackOff    7s (x3 over 34s)   kubelet, k8s-linuxpool-11683214-vmss000001  Back-off pulling image "osbapublicacr.azurecr.io/microsoft/azure-service-broker:v1.6.0"
  Warning  Failed     7s (x3 over 34s)   kubelet, k8s-linuxpool-11683214-vmss000001  Error: ImagePullBackOff
zhongyi-zhang commented 5 years ago
$ docker logout osbapublicacr.azurecr.io
Not logged in to osbapublicacr.azurecr.io
$ docker pull osbapublicacr.azurecr.io/microsoft/azure-service-broker:v1.6.0
v1.6.0: Pulling from microsoft/azure-service-broker
4e2dcb104e4b: Pull complete
9be14ae71d12: Pull complete
Digest: sha256:dab08af3c1e423893141393f64bf4c7de68f6f8d611b681fa7a714dcc98f7e1d
Status: Downloaded newer image for osbapublicacr.azurecr.io/microsoft/azure-service-broker:v1.6.0

That's strange. Just had a try, I can pull the image without logging in the ACR server as ACR team made it public. And I tested installing OSBA v1.6.0 by the helm chart successfully. Could you have a try using docker pulling it locally?

zhongyi-zhang commented 5 years ago

I can repro the issue. Mailing ACR team for help now.

zhongyi-zhang commented 5 years ago

@kyschouv are you using AKS? If so, could you check your AKS version? It was a known issue in AKS and fixed. FYI: https://github.com/andyzhangx/demo/blob/master/issues/acr-issues.md#2-image-pull-error-from-acr-anonymous-repository. Upgrading might resolve the issue.

kyschouv commented 5 years ago

Sorry for the delay responding. We're using aks-engine to deploy 1.13.4 (which in your link, should be post-fix). We're having the issue on a new deployment as well.

We didn't have this issue before, so not sure what's changed.

zhongyi-zhang commented 5 years ago

The docker image registry was migrated in v1.6.0. So you only hit it recently, that's possible. FYI: https://github.com/Azure/open-service-broker-azure/pull/699. And I found that the doc about public ACR pulling, is not accurate -- kube v1.13.4 didn't fix the issue, yet. You can see the PR https://github.com/kubernetes/kubernetes/pull/74715/files which fixed the issue. The change is not included in https://github.com/kubernetes/kubernetes/blob/v1.13.4/pkg/credentialprovider/azure/azure_credentials.go. Instead, it is included in https://github.com/kubernetes/kubernetes/blob/v1.13.5/pkg/credentialprovider/azure/azure_credentials.go. So have a try upgrading to v1.13.5. Or just use helm rollback to roll OSBA back to v1.5.0 for now.

kyschouv commented 5 years ago

Ah alright. I'll try a 1.13.6 cluster and see if that resolves it.

nicolasproton commented 5 years ago

Upgrading my AKS cluster to 1,13.5 fixed the issue. Check this for the version of kubernetes where it is fixed: https://github.com/andyzhangx/demo/blob/master/issues/acr-issues.md#2-image-pull-error-from-acr-anonymous-repository

kyschouv commented 5 years ago

1.14.1 worked as well.