GoogleContainerTools / skaffold

Easy and Repeatable Kubernetes Development
https://skaffold.dev/
Apache License 2.0
15.02k stars 1.62k forks source link

PVC get deleted on clean up (when bound by a Deployment) #4366

Open amitm02 opened 4 years ago

amitm02 commented 4 years ago

I've used minio as part of a release in Skaffold (sub-chart). link

The helm value of the minio: persistence.enabled is set to true, and a PVC is created for its data. However the PVC get terminated and data is lost when ctrl-c is pressed on Skaffold run. I don't think Skaffold should ever cleanup and delete PVCs.

I can see Skaffold add a clean up label to it, so probably related:

Name:          minio
Namespace:     default
StorageClass:  hostpath
Status:        Bound
Volume:        pvc-4c8b7fd6-c4b9-4281-8984-91064e6ff1ab
Labels:        app=minio
               app.kubernetes.io/managed-by=skaffold-v1.11.0
               chart=minio-5.0.30
               heritage=Helm
               release=file-storage
               skaffold.dev/builder=local
               skaffold.dev/cleanup=true
               skaffold.dev/deployer=helm
               skaffold.dev/docker-api-version=1.40
               skaffold.dev/run-id=1376e569-933e-4669-99e8-42901e3a7e81
               skaffold.dev/tag-policy=git-commit
               skaffold.dev/tail=true
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: docker.io/hostpath
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    minio-678684b4d6-l4dgc
tejal29 commented 4 years ago

Thanks @amitm02. Skaffold naively applies all clean-up label for all the resources deployed by it. You could use --cleanup=false to not clean up all resources along with your dev command.

However, to know more about your use case,

  1. Do you expect the PVC to be created only if its not present? Is this part of app setup?
  2. Across dev loops is it ok to use the same data?

If PVC creation is one time setup, i could see pre deploy hooks #1441 can solve this solution.

amitm02 commented 4 years ago

Yes, I would like to keep the data across different dev loops. I don't want to disable the cleanup all together.

I did notice the following: There seem to be inconsistent behaviour regarding how Skaffold treat PVC when it is created via StatefulSet "volumeClaimTemplates"..

I've added a @stable/mongodb helm subchart to the deploy. If I set it to run as standalone, via:

.Values.replicaSet.enabled = false
.Values.persistence.enabled = true

PVC template standalone Then the PVC is created on "Skaffold dev" and is also terminated on ctrl-C.

but If I run it as stateful state:

.Values.replicaSet.enabled = true
.Values.persistence.enabled = true

PVC template statfulSet Then the PVC is created on "Skaffold dev" but NOT terminated on ctrl-C.

briandealwis commented 4 years ago

@amitm02 Kubernetes doesn't remove PVCs when deleting StatefulSets:

Note that, the PersistentVolumes associated with the Pods' PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually.

gjcarneiro commented 3 years ago

I have the same issue. I would like to add a PVC to a postgres DB, so that data is retained by postgres even across a system reboot. Because it is expensive to compute (takes a couple of hours each time). To that end, it would be useful if skaffold would create the PVC, normally, maybe even update it if it detects changes in the source yaml, but not delete it when I hit Ctrl-C.

amendoza-navent commented 3 years ago

I have the same issue. I need to persist the Gradle cache after each build. I think Skaffold shouldn't delete a PersistentVolume by default, because people use it to persist the data actually.

mfrey777 commented 3 years ago

Same issue here. I store my postgres DB in a PVC and it very time consuming to re-initalize the database everytime (and especially to import my large test data set). For me, a PVC should not never be deleted automatically, otherwise it is not persistent anymore...

michielswaanen commented 3 years ago

Any update on this issue? Would love to see the option to ignore specific Kubernetes resources from being deleted once you exit Skaffold.

briandealwis commented 3 years ago

GCP's Config Connector uses a deletion-policy: abandon to indicate that resources should not be deleted. It might be worth turning skaffold.dev/cleanup from a boolean to an enum to provide flexibility in the future.

dominikbraun commented 2 years ago

What about adding a Kubernetes annotation that tells Skaffold to not clean up the respective resource when hitting Ctrl + C?

bahodge commented 2 years ago

Hey everyone, I feel like there was no clear solution to using skaffold dev with persistent volumes you don't want to delete. I've been fighting this for a few days and came up with an ok solution.

tldr; use a hook to create a pv outside of helm and skaffold with kubectl. Then have helm PVC point to the new static PV.

Environment:

With the newer versions of the skaffold api (v2beta26), you can use hooks for before and after deploy for helm.

# skaffold.yaml

apiVersion: skaffold/v2beta26
kind: Config

build:
  # ... put build stuff here

deploy:
  helm:
    hooks:
      before:
        - host:
            command: ["kubectl", "apply", "-f", "./helm/local-database-pv.yaml"]
            os: [darwin, linux]
        - host:
            command: ["kubectl", "patch", "pv", "local-database-pv", "-p", '{"spec":{"claimRef": null}}',] # we must clear the claim that is retained inorder to reclaim this pv.
            os: [darwin, linux]
    flags:
      upgrade:
        - "--install"
    releases:
      - name: local
        chartPath: helm
        skipBuildDependencies: true
        artifactOverrides:
          "api.image.fqn": *apiImage
          "frontend.image.fqn": *frontendImage

the word "local" is important for me because it must match the name of .Release.Name. This particular file is deployed directly using kubectl and is not part of the helm release.

# helm/local-database-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-database-pv
  labels:
    type: local
    component: local-database-pv
spec:
  storageClassName: local-database-pv
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    # in my case i'm using postgres
    path: /var/lib/postgresql/data
# helm/templates/pvcs.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: {{ .Release.Name }}-database-pvc
  labels:
    component: {{ .Release.Name }}-database
spec:
  storageClassName: local-database-pv
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Make a database stateful set that will communicate with the PV via the PVC. In this case, I'm using a very old postgres and I'm not even doing secrets stuff yet, which you should.

# helm/templates/database-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ .Release.Name }}-database
spec:
  serviceName: {{ .Release.Name }}-database-svc
  replicas: 1
  selector:
    matchLabels:
      component: {{ .Release.Name }}-database
  template:
    metadata:
      labels:
        component: {{ .Release.Name }}-database
    spec:
      volumes: 
        - name: {{ .Release.Name }}-database-pv
          persistentVolumeClaim:
            claimName: {{ .Release.Name }}-database-pvc
      containers:
        - name: {{ .Release.Name }}-database
          image: postgres:10.7-alpine
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: {{ .Release.Name }}-database-pv
              mountPath: /var/lib/postgresql/data
          env:
            - name: POSTGRES_USER
              value: {{ .Values.database.user }}
            - name: POSTGRES_PASSWORD
              value: {{ .Values.database.password }}
            - name: POSTGRES_HOST_AUTH_METHOD
              value: trust

In my particular case, the apiImage is a rails app. I tested it out by doing the following.

start skaffold dev

skaffold dev

Open another terminal

kubectl exec -it <api pod> -- bash

rails db:create
# => Database <my database> created

Now stop & restart skaffold dev ctrl + c

Open another terminal

kubectl exec -it <api pod> -- bash

rails db:create
# => Database <my database> already exists

I hope this helps people, but it would be much better if skaffold provided a simple configuration option instead.

computerquip-work commented 2 years ago

Am I mistaken that this is a common case for basically any deployment that uses a database? How is everyone else using this to where they aren't running into constant problems?

jobcespedes commented 2 years ago

Yep. Database use case here. As a workaround, we are using hooks to create the PVC of the database in a pre hook step.

briandealwis commented 2 years ago

I would argue that you should not be using Skaffold to deploy a PVC (or a PV): you should be managing those outside of Skaffold — or at least outside of your skaffold dev / skaffold debug.

There's a lot of confusion about PV and PVC, and unfortunately the docs aren't very clear 😞 There was a discussion on this topic on an internal Kubernetes mailing list from which I gleaned the following wisdom:

  • The lifecycle of a PV is "as long as the PVC exists, the PV is yours." If you wish to retain the contents of the PV then you can't release the claim. A PVC is like reference counting: when you delete the PVC, you are explicitly releasing the data.
  • The assumption of the retain policy on a PV is that an admin needs to examine PV, potentially performing some cleanup, before it can be reused by another PVC. The admin would clear the PV.spec.claimRef and make it available again.
  • Multiple pods can use the same PVC if it supports one of the *Many access modes
  • A pod doesn't release or return a PVC: a pod uses or references a PVC. A pod can terminate but the PVC lives on.

A useful approach is to have a separate init/ directory with a skaffold.yaml to install any one-time setup required by your app, such as PVCs, requisite Helm charts, etc. (cd init; skaffold deploy) will install the setup, and (cd init; skaffold delete) will clean them up.

dominikbraun commented 2 years ago

I would argue that you should not be using Skaffold to deploy a PVC (or a PV): you should be managing those outside of Skaffold — or at least outside of your skaffold dev / skaffold debug.

I'd like to provision my entire development environment - including all services, and the database as well - within a single skaffold dev command. This already works well, but the created PVC for the database inside the cluster doesn't survive the shutdown. From my point of view, I don't see any particular reasons for managing PVCs outside of Skaffold per se.

mattbucci commented 9 months ago

This is fairly straightforward now in 2024 thanks to merged support for hooks: https://github.com/GoogleContainerTools/skaffold/issues/1441

I needed a persistent volume claim to run my s3 mock service minio locally

I created a file with a claim

./deploy/local/volume.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

and I applied it using a pre-deploy hook:

 deploy:
  kubectl: 
    hooks:
      before:
        - host:
            command: ["sh", "-c", "kubectl apply -f ./deploy/local/volume.yaml"]
            dir: './'

Note, at this time, dir: './' seems to be required to prevent issues if you include this skaffold file from another file. Otherwise it will try to resolve from the root skaffold and you will get error error: the path "./deploy/local/volume.yaml" does not exist