cloudfoundry-incubator / kubecf

Cloud Foundry on Kubernetes
Apache License 2.0
115 stars 62 forks source link

CF Push fails with error message "Start unsuccessful" error for all deployments after AKS upgrade from 1.18.8 to 1.19.3 #1660

Open CincomGithubService opened 3 years ago

CincomGithubService commented 3 years ago

Describe the bug After we upgraded our AKS cluster from 1.18.8 version to 1.19.3 version, CF deployments are failing on KubeCF.

Here are the sample logs from three of our deployments:


cf push service-registry-dr -p service-registry-1.0.9.jar -b java_buildpack -d system.kubecfdr.prod.cincomcloud.com --hostname service-registry-dr Deprecation warning: Use of the '-d' command-line flag option is deprecated in favor of the 'routes' property in the manifest. Please see https://docs.cloudfoundry.org/devguide/deploy-apps/manifest-attributes.html#routes for usage information. The '-d' command-line flag option will be removed in the future.

Pushing app service-registry-dr to org cpq / space dr as developer... Getting app info... Creating app with these attributes...

Creating app service-registry-dr... Mapping routes... Comparing local files to remote cache... Packaging files to upload... Uploading files... 792.65 KiB / 792.65 KiB [=====================================================================================================================================================================] 100.00% 2s

Waiting for API to complete processing files...

Staging app and tracing logs... 2020/12/18 17:02:28 Installing dependencies -----> Java Buildpack v4.32.1.1 | git@github.com:SUSE/cf-java-buildpack.git#4632e14b -----> Downloading Jvmkill Agent 1.16.0_RELEASE from https://java-buildpack.cloudfoundry.org/jvmkill/bionic/x86_64/jvmkill-1.16.0-RELEASE.so (0.3s) -----> Downloading Open Jdk JRE 1.8.0_262 from https://java-buildpack.cloudfoundry.org/openjdk/bionic/x86_64/bellsoft-jre8u262%2B10-linux-amd64.tar.gz (1.5s) Expanding Open Jdk JRE to .java-buildpack/open_jdk_jre (0.9s) -----> Downloading Open JDK Like Memory Calculator 3.13.0_RELEASE from https://java-buildpack.cloudfoundry.org/memory-calculator/bionic/x86_64/memory-calculator-3.13.0-RELEASE.tar.gz (0.1s) Loaded Classes: 18438, Threads: 250 -----> Downloading Client Certificate Mapper 1.11.0_RELEASE from https://java-buildpack.cloudfoundry.org/client-certificate-mapper/client-certificate-mapper-1.11.0-RELEASE.jar (0.1s) -----> Downloading Container Security Provider 1.18.0_RELEASE from https://java-buildpack.cloudfoundry.org/container-security-provider/container-security-provider-1.18.0-RELEASE.jar (0.1s) -----> Downloading Spring Auto Reconfiguration 2.11.0_RELEASE from https://java-buildpack.cloudfoundry.org/auto-reconfiguration/auto-reconfiguration-2.11.0-RELEASE.jar (0.2s) 2020/12/18 17:02:43 Building droplet release 2020/12/18 17:02:47 Creating app artifact

Waiting for app to start... Start unsuccessful

TIP: use 'cf logs service-registry-dr --recent' for more information FAILED


~/kubecf/cf-hello-worlds/dotnet-core$ cf push Pushing from manifest to org cpq / space dr as developer... Using manifest file /home/skud/kubecf/cf-hello-worlds/dotnet-core/manifest.yml Getting app info... Creating app with these attributes...

Creating app test-dotnet-core... Mapping routes... Comparing local files to remote cache... Packaging files to upload... Uploading files... 5.26 KiB / 5.26 KiB [=========================================================================================================================================================================] 100.00% 1s

Waiting for API to complete processing files...

Staging app and tracing logs... 2020/12/18 17:01:00 Installing dependencies 2020/12/18 17:01:09 Cleaning cache dir 2020/12/18 17:01:09 Detecting buildpack -----> Dotnet-Core Buildpack version 2.3.16.1 -----> Supplying Dotnet Core -----> Installing libunwind 1.4.0 Download [https://cf-buildpacks.suse.com/dependencies/libunwind/libunwind-1.4.0-linux-x64-sle15-1e79db4b.tgz] using the default SDK -----> Installing dotnet-sdk 3.1.402 Download [https://cf-buildpacks.suse.com/dependencies/dotnet/dotnet-sdk_3.1.402_linux_x64_any-stack_e0aedde7.tar.xz] -----> Installing dotnet-runtime 3.1.8 -----> Finalizing Dotnet Core -----> Installing dotnet-runtime 3.1.8 Download [https://cf-buildpacks.suse.com/dependencies/dotnet/dotnet-runtime_3.1.8_linux_x64_any-stack_a1e739c5.tar.xz] -----> Publish dotnet

Welcome to .NET Core 3.1!

SDK Version: 3.1.402

Telemetry

The .NET Core tools collect usage data in order to help us improve your experience. The data is anonymous. It is collected by Microsoft and shared with the community. You can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell.

Read more about .NET Core CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry

Explore documentation: https://aka.ms/dotnet-docs Report issues and find source on GitHub: https://github.com/dotnet/core Find out what's new: https://aka.ms/dotnet-whats-new Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli-docs Write your first app: https://aka.ms/first-net-core-app

Microsoft (R) Build Engine version 16.7.0+7fb82e5b2 for .NET Copyright (C) Microsoft Corporation. All rights reserved.

Determining projects to restore... Restored /tmp/app-bits527065936/dotnet-core-hello-world.csproj (in 2.15 sec). dotnet-core-hello-world -> /tmp/app-bits527065936/bin/Debug/netcoreapp3.1/sles.12.3-x64/dotnet-core-hello-world.dll dotnet-core-hello-world -> /tmp/contents485840495/deps/0/dotnet_publish/ -----> Cleaning staging area Removing .nuget Removing .local Removing dotnet-sdk 2020/12/18 17:01:29 Building droplet release 2020/12/18 17:01:29 Creating app artifact

Waiting for app to start... Start unsuccessful

TIP: use 'cf logs test-dotnet-core --recent' for more information FAILED


~/kubecf/cf-hello-worlds/go-hello$ cf push Pushing from manifest to org cpq / space dr as developer... Using manifest file /home/skud/kubecf/cf-hello-worlds/go-hello/manifest.yml

Deprecation warning: Specifying app manifest attributes at the top level is deprecated. Found: env, name. Please see https://docs.cloudfoundry.org/devguide/deploy-apps/manifest-attributes.html#deprecated for alternatives and other app manifest deprecations. This feature will be removed in the future.

Using manifest file /home/skud/kubecf/cf-hello-worlds/go-hello/manifest.yml

Creating app go-test in org cpq / space dr as developer... OK

Creating route go-test.system.kubecfdr.prod.cincomcloud.com... OK

Binding go-test.system.kubecfdr.prod.cincomcloud.com to go-test... OK

Uploading go-test... Uploading app files from: /home/skud/kubecf/cf-hello-worlds/go-hello Uploading 1.1K, 3 files Done uploading
OK

Starting app go-test in org cpq / space dr as developer... Warning: error tailing logs unexpected status code 404 Warning: error tailing logs unexpected status code 404 Warning: error tailing logs unexpected status code 404 Warning: error tailing logs unexpected status code 404 Warning: error tailing logs unexpected status code 404

0 of 1 instances running, 1 crashed FAILED Error restarting application: Start unsuccessful

TIP: use 'cf logs go-test --recent' for more information


The logs on Eirini pods show these:

Back-off pulling image "127.0.0.1:31666/cloudfoundry/a60e220f-1e56-48ce-b3bb-51a0fe9ccdf0:4be74abeda453959be26100db97dc8c5e0826408" Source kubelet aks-linuxds4v2np-90461878-vmss000007 Count 86 Sub-object spec.containers{opi} Last seen 2020-12-18T17:07:31Z

Failed to pull image "127.0.0.1:31666/cloudfoundry/93b04f68-8d62-40f5-bedd-cd1205bb8378:1d3ae4dfc73a0cf039beda415910db7e4b180f69": rpc error: code = Unknown desc = failed to pull and unpack image "127.0.0.1:31666/cloudfoundry/93b04f68-8d62-40f5-bedd-cd1205bb8378:1d3ae4dfc73a0cf039beda415910db7e4b180f69": failed to resolve reference "127.0.0.1:31666/cloudfoundry/93b04f68-8d62-40f5-bedd-cd1205bb8378:1d3ae4dfc73a0cf039beda415910db7e4b180f69": unexpected status code [manifests 1d3ae4dfc73a0cf039beda415910db7e4b180f69]: 400 Bad Request Source kubelet aks-nodepool1-90461878-vmss000001 Count 4 Sub-object spec.containers{opi} Last seen 2020-12-18T17:04:44Z


To Reproduce Upgrade AKS cluster from 1.18.8 version to 1.19.3 version and run "cf push" for any application

Expected behavior The application should be deployed successfully

Environment AKS Version : 1.19.3 KubeCF version : kubecf-2.6.1 Operator version : cf-operator-6.1.17+0.gec409fd7

Additional context No issues with any other pods or deployments on the cluster. Everything seems to run fine.

CincomGithubService commented 3 years ago

Describing the failing pod :

kubectl describe pod/test-dotnet-core-dr-5995467e9f-0 --namespace=eirini Name: test-dotnet-core-dr-5995467e9f-0 Namespace: eirini Priority: 0 Node: aks-nodepool1-90461878-vmss00000d/172.28.0.4 Start Time: Mon, 28 Dec 2020 12:40:15 -0500 Labels: cloudfoundry.org/app_guid=4189cd5d-b80d-4d6e-a6e9-bbda8683bf99 cloudfoundry.org/guid=4189cd5d-b80d-4d6e-a6e9-bbda8683bf99 cloudfoundry.org/process_type=web cloudfoundry.org/rootfs-version= cloudfoundry.org/source_type=APP cloudfoundry.org/version=39b6452e-3b54-4514-92d3-e06fe51fdec4 controller-revision-hash=test-dotnet-core-dr-5995467e9f-7c76f9f79f statefulset.kubernetes.io/pod-name=test-dotnet-core-dr-5995467e9f-0 Annotations: cloudfoundry.org/application_id: 4189cd5d-b80d-4d6e-a6e9-bbda8683bf99 cloudfoundry.org/application_name: test-dotnet-core cloudfoundry.org/last_updated: 1609177214.0 cloudfoundry.org/org_guid: fd763f40-b4f1-42d5-8d3a-ee1fd97b4798 cloudfoundry.org/org_name: cpq cloudfoundry.org/original_request: {"guid":"4189cd5d-b80d-4d6e-a6e9-bbda8683bf99","version":"39b6452e-3b54-4514-92d3-e06fe51fdec4","process_guid":"4189cd5d-b80d-4d6e-a6e9-bb... cloudfoundry.org/process_guid: 4189cd5d-b80d-4d6e-a6e9-bbda8683bf99-39b6452e-3b54-4514-92d3-e06fe51fdec4 cloudfoundry.org/routes: [{"hostname":"test-dotnet-core.system.kubecfdr.prod.cincomcloud.com","port":8080}] cloudfoundry.org/space_guid: 59c4885f-6fae-48d3-8408-9d4f684453ed cloudfoundry.org/space_name: dr cloudfoundry.org/version: 39b6452e-3b54-4514-92d3-e06fe51fdec4 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Pending IP: 172.28.0.33 IPs: IP: 172.28.0.33 Controlled By: StatefulSet/test-dotnet-core-dr-5995467e9f Containers: opi: Container ID:
Image: 127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480 Image ID:
Port: 8080/TCP Host Port: 0/TCP Command: dumb-init

  /bin/sh
  -c
  (  /lifecycle/launch && sleep 5 ) || sleep 5
State:          Waiting
  Reason:       ImagePullBackOff
Ready:          False
Restart Count:  0
Limits:
  ephemeral-storage:  1024M
  memory:             256M
Requests:
  cpu:                30m
  ephemeral-storage:  1024M
  memory:             256M
Liveness:             tcp-socket :8080 delay=0s timeout=1s period=10s #success=1 #failure=4
Readiness:            tcp-socket :8080 delay=0s timeout=1s period=10s #success=1 #failure=1
Environment:
  PORT:                     8080
  VCAP_APP_PORT:            8080
  HOME:                     /home/vcap/app
  CACHE_NUGET_PACKAGES:     false
  CF_INSTANCE_ADDR:         0.0.0.0:8080
  CF_INSTANCE_PORTS:        [{"external":8080,"internal":8080}]
  START_COMMAND:            cd ${DEPS_DIR}/0/dotnet_publish && exec ./dotnet-core-hello-world --server.urls http://0.0.0.0:${PORT}
  TMPDIR:                   /home/vcap/tmp
  MEMORY_LIMIT:             256m
  VCAP_SERVICES:            {}
  LANG:                     en_US.UTF-8
  CF_INSTANCE_PORT:         8080
  VCAP_APPLICATION:         {"cf_api":"https://api.system.kubecfdr.prod.cincomcloud.com","limits":{"fds":16384,"mem":256,"disk":1024},"application_name":"test-dotnet-core","application_uris":["test-dotnet-core.system.kubecfdr.prod.cincomcloud.com"],"name":"test-dotnet-core","space_name":"dr","space_id":"59c4885f-6fae-48d3-8408-9d4f684453ed","organization_id":"fd763f40-b4f1-42d5-8d3a-ee1fd97b4798","organization_name":"cpq","uris":["test-dotnet-core.system.kubecfdr.prod.cincomcloud.com"],"process_id":"4189cd5d-b80d-4d6e-a6e9-bbda8683bf99","process_type":"web","application_id":"4189cd5d-b80d-4d6e-a6e9-bbda8683bf99","version":"39b6452e-3b54-4514-92d3-e06fe51fdec4","application_version":"39b6452e-3b54-4514-92d3-e06fe51fdec4"}
  VCAP_APP_HOST:            0.0.0.0
  PATH:                     /usr/local/bin:/usr/bin:/bin
  USER:                     vcap
  PWD:                      /home/vcap/app
  POD_NAME:                 test-dotnet-core-dr-5995467e9f-0 (v1:metadata.name)
  CF_INSTANCE_GUID:          (v1:metadata.uid)
  CF_INSTANCE_IP:            (v1:status.hostIP)
  CF_INSTANCE_INTERNAL_IP:   (v1:status.podIP)
  EIRINI_SSH_KEY:           <set to the key 'public_key' in secret '4189cd5d-b80d-4d6e-a6e9-bbda8683bf99-39b6452e-3b54-4514-92d3-e06fe51fdec4-0-ssh-key-meta'>   Optional: false
  EIRINI_HOST_KEY:          <set to the key 'private_key' in secret '4189cd5d-b80d-4d6e-a6e9-bbda8683bf99-39b6452e-3b54-4514-92d3-e06fe51fdec4-0-ssh-key-meta'>  Optional: false
  CF_INSTANCE_INDEX:        0
Mounts:                     <none>

Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal Scheduled Successfully assigned eirini/test-dotnet-core-dr-5995467e9f-0 to aks-nodepool1-90461878-vmss00000d Normal Pulling 6m3s (x4 over 7m37s) kubelet, aks-nodepool1-90461878-vmss00000d Pulling image "127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480" Warning Failed 6m3s (x4 over 7m37s) kubelet, aks-nodepool1-90461878-vmss00000d Failed to pull image "127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480": rpc error: code = Unknown desc = failed to pull and unpack image "127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480": failed to resolve reference "127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480": unexpected status code [manifests 2fa57b8686d7ba639920daabafb4faf840aa3480]: 400 Bad Request Warning Failed 6m3s (x4 over 7m37s) kubelet, aks-nodepool1-90461878-vmss00000d Error: ErrImagePull Normal BackOff 5m51s (x6 over 7m36s) kubelet, aks-nodepool1-90461878-vmss00000d Back-off pulling image "127.0.0.1:31666/cloudfoundry/13f512f3-5f13-4c1d-a8c8-2ba5d1f3dae9:2fa57b8686d7ba639920daabafb4faf840aa3480" Warning Failed 2m35s (x20 over 7m36s) kubelet, aks-nodepool1-90461878-vmss00000d Error: ImagePullBackOff