gardener / dashboard

Web-based GUI for Gardener installations.
Apache License 2.0
208 stars 103 forks source link

Cluster creation error not visible in the dashboard #1231

Closed freegroup closed 2 years ago

freegroup commented 2 years ago

During the (broken) creation of a gardener cluster an error happens in the gardener backend. Unfortunately this error is not visible in the dashboard. We can only see a spinning icon and a pending state message.

Support team inspect this error and we get this information back

k -n garden-als-ingest describe shoot stockholm

Warning SchedulingFailed 2m54s (x9 over 13m) shoot-scheduler (combined from similar events): Failed to schedule shoot 'stockholm': 0/10 seed cluster candidate(s) are eligible for scheduling: {aws-eu3 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu8 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu1 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu9 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu7 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu6 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu5 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu4 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-dr => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)], aws-eu2 => invalid networks: [Invalid value: "192.168.0.0/17": shoot pod network intersects with default vpn network (192.168.123.0/24)]} `

I think it is much better to transport this error to the enduser instead bothering the support team. :-)

Example of wrong YAML

kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
  name: stockholm
  namespace: garden-als-ingest
  uid: b0a06c59-8af9-414d-807c-e77d926fdaac
  resourceVersion: '3452130281'
  generation: 1
  creationTimestamp: '2022-06-02T13:02:33Z'
  labels:
    extensions.extensions.gardener.cloud/shoot-dns-service: 'true'
    networking.extensions.gardener.cloud/calico: 'true'
    operatingsystemconfig.extensions.gardener.cloud/gardenlinux: 'true'
    provider.extensions.gardener.cloud/aws: 'true'
    shoot.gardener.cloud/status: healthy
spec:
  addons:
    kubernetesDashboard:
      enabled: false
      authenticationMode: token
    nginxIngress:
      enabled: false
      externalTrafficPolicy: Cluster
  cloudProfileName: aws
  dns: {}
  extensions:
    - type: shoot-dns-service
      providerConfig:
        apiVersion: service.dns.extensions.gardener.cloud/v1alpha1
        kind: DNSConfig
        syncProvidersFromShootSpecDNS: true
  hibernation:
    schedules:
      - start: '00 17 * * 1,2,3,4,5'
        location: Europe/Sofia
  kubernetes:
    allowPrivilegedContainers: false
    kubeAPIServer:
      enableBasicAuthentication: false
      requests:
        maxNonMutatingInflight: 400
        maxMutatingInflight: 200
      enableAnonymousAuthentication: false
      eventTTL: 1h0m0s
    kubeControllerManager:
      nodeCIDRMaskSize: 24
      podEvictionTimeout: 2m0s
      nodeMonitorGracePeriod: 2m0s
    kubeProxy:
      mode: IPTables
      enabled: true
    kubelet:
      failSwapOn: true
      kubeReserved:
        cpu: 80m
        memory: 1Gi
        pid: 20k
      imageGCHighThresholdPercent: 50
      imageGCLowThresholdPercent: 40
      serializeImagePulls: true
    version: 1.23.4
    verticalPodAutoscaler:
      enabled: true
      evictAfterOOMThreshold: 10m0s
      evictionRateBurst: 1
      evictionRateLimit: -1
      evictionTolerance: 0.5
      recommendationMarginFraction: 0.15
      updaterInterval: 1m0s
      recommenderInterval: 1m0s
    enableStaticTokenKubeconfig: true
  networking:
    type: calico
    pods: 192.168.0.0/17
    nodes: 10.251.120.0/22
    services: 192.168.128.0/17
  maintenance:
    autoUpdate:
      kubernetesVersion: false
      machineImageVersion: true
    timeWindow:
      begin: 220000+0100
      end: 230000+0100
  provider:
    type: aws
    controlPlaneConfig:
      apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
    infrastructureConfig:
      apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vpc:
          id: vpc-#############
        zones:
          - internal: 10.251.121.0/26
            name: eu-north-1a
            public: 10.251.120.0/26
            workers: 10.251.122.0/26
    workers:
      - cri:
          name: containerd
        name: cpu-worker
        machine:
          type: m5.large
          image:
            name: gardenlinux
            version: 576.8.0
        maximum: 12
        minimum: 1
        maxSurge: 1
        maxUnavailable: 0
        volume:
          type: gp2
          size: 20Gi
        zones:
          - eu-north-1a
        systemComponents:
          allow: true
  purpose: development
  region: eu-north-1
  secretBindingName: aws-poc-ingest
  systemComponents:
    coreDNS:
      autoscaling:
        mode: horizontal
status:
  gardener:
    id: ''
    name: ''
    version: ''
  hibernated: false
  technicalID: ''
  uid: ''
grolu commented 2 years ago

I'm not sure about this but I think we had a similar request in the past but I cannot remember the outcome. Anyway, this is not a dashboard issue. Please open a issue in g/g repository and ask if they find a way to transport the issue to the shoot status resource. If not, we cannot display the issue in the dashboard

grolu commented 2 years ago

Just to make it clear: Of course there are ways to access this information. However, right now we depend on the information we get from the shoot / shoot status resource. If we want to display other information this would require a lot of work. I'm not saying that we do not want to display this information, we even have a issue for this: https://github.com/gardener/dashboard/issues/26 It is just not feasible with the current implementation. That's why I closed the issue for now. maybe gardener can improve here but as I already stated I think they have similar issues as the error happens during creation and they cannot put the error into the status resource at this point in time if I remember correctly.