deis / monitor

Monitoring for Deis Workflow
https://deis.com
MIT License
22 stars 32 forks source link

Can't upgrade w/ grafana.persistence.enabled=true #188

Open vdice opened 7 years ago

vdice commented 7 years ago

On GKE, upgrading with grafana persistence enabled leads to a Failed to attach volume error:

 $ k version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-07T23:53:09Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-07T23:34:32Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

 $ cat values-only-grafana-persistent.yaml
monitor:
  grafana:
    persistence:
      enabled: true
  influxdb:
    persistence:
      enabled: false

 $ helm install workflow-dev/workflow --namespace deis --name deis-workflow -f values-only-grafana-persistent.yaml
...
 $ # (wait until all up and running)

 $ helm upgrade deis-workflow -f values-only-grafana-persistent.yaml workflow/workflow
...
 $ kd get po | grep grafana
deis-monitor-grafana-1674584155-qz1pq    0/1       ContainerCreating   0          40s
deis-monitor-grafana-2030706665-cg04s    1/1       Running             0          4m

$ kd describe po deis-monitor-grafana-1674584155-qz1pq
Name:       deis-monitor-grafana-1674584155-qz1pq
Namespace:  deis
Node:       gke-vrd-default-pool-7890514e-p65q/10.240.0.22
Start Time: Tue, 28 Mar 2017 12:10:58 -0600
Labels:     app=deis-monitor-grafana
        pod-template-hash=1674584155
Status:     Pending
IP:
Controllers:    ReplicaSet/deis-monitor-grafana-1674584155
Containers:
  deis-monitor-grafana:
    Container ID:
    Image:      quay.io/deis/grafana:v2.8.0
    Image ID:
    Port:       3500/TCP
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Volume Mounts:
      /var/lib/grafana from grafana-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f1372 (ro)
    Environment Variables:
      INFLUXDB_URLS:        http://$(DEIS_MONITOR_INFLUXAPI_SERVICE_HOST):$(DEIS_MONITOR_INFLUXAPI_SERVICE_PORT_TRANSPORT)
      BIND_PORT:        3500
      DEFAULT_USER:     admin
      DEFAULT_USER_PASSWORD:    admin
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  grafana-data:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  deis-monitor-grafana
    ReadOnly:   false
  default-token-f1372:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-f1372
QoS Class:  BestEffort
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  1m        1m      1   {default-scheduler }            Normal      Scheduled   Successfully assigned deis-monitor-grafana-1674584155-qz1pq to gke-vrd-default-pool-7890514e-p65q
  1m        14s     7   {controller-manager }           Warning     FailedMount Failed to attach volume "pvc-5d2452ce-13e1-11e7-b75d-42010a80013f" on node "gke-vrd-default-pool-7890514e-p65q" with: googleapi: Error 400: The disk resource 'projects/deis-sandbox/zones/us-central1-b/disks/gke-vrd-c820edac-dynam-pvc-5d2452ce-13e1-11e7-b75d-42010a80013f' is already being used by 'projects/deis-sandbox/zones/us-central1-b/instances/gke-vrd-default-pool-7890514e-0xxs'
kukikiloke commented 7 years ago

Hi, I also run into the similar error when using persistence volume for grafana.

One of my pods is stuck at pending state due to error pod (deis-monitor-grafana-465911159-nkhmp) failed to fit in any node fit failure summary on nodes : NoVolumeZoneConflict (2), PodToleratesNodeTaints (1). The followings are the zones my AWS resources running on:

Master: us-east-1a
Nodes: us-east-1c, us-east-1d
Dynamic pvc Volume: us-east-1a

It seems to be related to which zones the master/nodes are hosted on as it works by running the master, nodes, volume in different zones (don't have an example at this moment).

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/monitor#5