`devspace dev` replaces pod with `-devspace` suffixed pod but does not appear to update deployment and replaced pod will not delete on `devspace purge`

wayspurrchen commented 3 years ago

What happened?

I am having a bevy of small issues but I'm not familiar enough with devspaces to tell which are bugs, intended behavior, or user error. Regardless, there may be opportunities for documentation or modifications from my difficulties.

Context: I am attempting to use devspaces as part of my evaluation of service meshes, starting with Istio. I have a simple application that is configured to present a few HTTP routes and hit one of two other identical applications. These are simple Express apps exposed over port 80, but only the Deployment named front-app is exposed to Istio Ingress via its Gateway/VirtualService mechanism (back-app-1 and back-app-2 are not).

I am using labelSelector.app and containerName to specify the front-app Pod as well as my sample application container, also named front-app, since there is also the Istio sidecar container present.

See service topography below:

The services are red because they are hardcoded to randomly throw errors for demonstration of error-tracking during tracing, so that is expected.

Running devspace dev shows this output:

➜  service-mesh-test git:(master) ✗ devspace dev
[info]   Using namespace 'default'
[info]   Using kube context 'gke_myorg_us-west1-b_smt-istio'
[info]   Skipping building image app                               
[info]   Execute 'helm upgrade service-mesh-test ./helm/smt-app --namespace default --values /var/folders/_n/lzlctwb94sv2_zh7mrr4v0lm0000gp/T/182524211 --install --kube-context gke_myorg_us-west1-b_smt-istio'
[info]   Execute 'helm list --namespace default --output json --kube-context gke_myorg_us-west1-b_smt-istio'
[done] √ Deployed helm chart (Release revision: 1)                              
[done] √ Successfully deployed service-mesh-test with helm                      
[done] √ Scaled down Deployment default/front-app                      
[done] √ Successfully replaced pod default/front-app-7c59885f69-vbwhp              
[done] √ Port forwarding started on 8080:80 (default/front-app-7c59885f69-vbwhp-devspace)

#########################################################
[info]   DevSpace UI available at: http://localhost:8090
#########################################################

[done] √ Sync started on /Users/way/Documents/Projects/service-mesh-test <-> . (Pod: default/front-app-7c59885f69-vbwhp-devspace)
[info]   Opening 'http://localhost:8080' as soon as application will be started (timeout: 4m0s)
[info]   Opening shell to pod:container front-app-7c59885f69-vbwhp-devspace:front-app

   ____              ____
  |  _ \  _____   __/ ___| _ __   __ _  ___ ___
  | | | |/ _ \ \ / /\___ \| '_ \ / _` |/ __/ _ \
  | |_| |  __/\ V /  ___) | |_) | (_| | (_|  __/
  |____/ \___| \_/  |____/| .__/ \__,_|\___\___|
                          |_|

Welcome to your development container!

This is how you can work with it:
- Run `npm start` to start the application
- Run `npm run dev` to start hot reloading
- Files will be synchronized between your local machine and this container
- Some ports will be forwarded, so you can access this container on your local machine via localhost:

./devspace_start.sh: line 34: devspace: command not found
root@front-app-7c59885f69-vbwhp-devspace:/home/app# %    # <-- no longer in shell, interrupted by restart
home_shell $

The terminal session inside the front-app-7c59885f69-vbwhp-devspace pod has quit by this point, implying a restart, so I am back at my home terminal.

Running kubectl get pod shows the front-app has had its pod replaced, and shows one restart:

NAME                                  READY   STATUS    RESTARTS   AGE
back-app-1-5b76c49679-gdtt4           2/2     Running   0          114s
back-app-2-5c56b5d68-gtz24            2/2     Running   0          114s
front-app-7c59885f69-vbwhp-devspace   2/2     Running   1          105s

kubectl logs -f front-app-7c59885f69-vbwhp-devspace --previous shows that the container entered a crash loop due to apparently missing application files:

2021-05-25T19:42:46: PM2 log: App [Service-Mesh-Test-App-Frontend:0] starting in -cluster mode-
2021-05-25T19:42:46: PM2 log: App [Service-Mesh-Test-App-Frontend:0] online
Error: Cannot find module '/home/app/index.js'
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:941:15)
    at Function.Module._load (node:internal/modules/cjs/loader:774:27)
    at /usr/local/lib/node_modules/pm2/lib/ProcessContainer.js:303:25
    at wrapper (/usr/local/lib/node_modules/pm2/node_modules/async/internal/once.js:12:16)
    at next (/usr/local/lib/node_modules/pm2/node_modules/async/waterfall.js:96:20)
    at /usr/local/lib/node_modules/pm2/node_modules/async/internal/onlyOnce.js:12:16
    at WriteStream.<anonymous> (/usr/local/lib/node_modules/pm2/lib/Utility.js:186:13)
    at WriteStream.emit (node:events:365:28)
    at node:internal/fs/streams:72:16
    at FSReqCallback.oncomplete (node:fs:184:23)
2021-05-25T19:42:46: PM2 log: App name:Service-Mesh-Test-App-Frontend id:0 disconnected
2021-05-25T19:42:46: PM2 log: App [Service-Mesh-Test-App-Frontend:0] exited with code [0] via signal [SIGINT]
2021-05-25T19:42:46: PM2 log: Script /home/app/index.js had too many unstable restarts (16). Stopped. "errored"
2021-05-25T19:42:47: PM2 log: 0 application online, retry = 3
2021-05-25T19:42:49: PM2 log: 0 application online, retry = 2
2021-05-25T19:42:51: PM2 log: 0 application online, retry = 1
2021-05-25T19:42:53: PM2 log: 0 application online, retry = 0
2021-05-25T19:42:53: PM2 error: app=Service-Mesh-Test-App-Frontend id=0 does not have a pid
2021-05-25T19:42:53: PM2 log: [Watch] Stop watching Service-Mesh-Test-App-Frontend
2021-05-25T19:42:53: PM2 log: Stopping app:Service-Mesh-Test-App-Frontend id:0
2021-05-25T19:42:53: PM2 log: PM2 successfully stopped

This error does not show up for the other pods back-app-1-... and back-app-2-..., and all three deployments/pods share the same Dockerfile, code, and built image residing on Google Container Registry, which is what makes me suspect this is related to devspaces.

Aside: During this bug report I realized that I did not have devspace.yaml's dev.sync.localSubPath option set, as my application is in a folder named app. I updated my config with the appropriate folder and no longer found the restart issue, but then encountered this error upon running devspace dev:

[info]   Using namespace 'default'
[info]   Using kube context 'gke_myorg_us-west1-b_smt-istio'
[info]   Skipping building image app                               
[info]   Execute 'helm list --namespace default --output json --kube-context gke_myorg_us-west1-b_smt-istio'
[info]   Execute 'helm upgrade service-mesh-test ./helm/smt-app --namespace default --values /var/folders/_n/lzlctwb94sv2_zh7mrr4v0lm0000gp/T/054087546 --install --kube-context gke_myorg_us-west1-b_smt-istio'
[info]   Execute 'helm list --namespace default --output json --kube-context gke_myorg_us-west1-b_smt-istio'
[done] √ Deployed helm chart (Release revision: 1)                              
[done] √ Successfully deployed service-mesh-test with helm                      
[done] √ Scaled down Deployment default/front-app                      
[done] √ Successfully replaced pod default/front-app-7c59885f69-glqhz              
[done] √ Port forwarding started on 8080:80 (default/front-app-7c59885f69-glqhz-devspace)

#########################################################
[info]   DevSpace UI available at: http://localhost:8090
#########################################################

[done] √ Sync started on /Users/way/Documents/Projects/service-mesh-test/app <-> . (Pod: default/front-app-7c59885f69-glqhz-devspace)
[info]   Opening 'http://localhost:8080' as soon as application will be started (timeout: 4m0s)
[info]   Opening shell to pod:container front-app-7c59885f69-glqhz-devspace:front-app
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"./devspace_start.sh\": stat ./devspace_start.sh: no such file or directory": unknown

I figured since I am now in a subdirectory I should update terminal.command to be ../devspace_start.sh but still received the same above error, except with ../devspace_start.sh instead of ./devspace_start.sh.

However, the pods do appear to have been properly deployed without restart:

> kubectl get pod
NAME                                  READY   STATUS    RESTARTS   AGE
back-app-1-5b76c49679-7f4wg           2/2     Running   0          60s
back-app-2-5c56b5d68-thzn8            2/2     Running   0          60s
front-app-7c59885f69-6q9vv-devspace   2/2     Running   0          52s

But other issues experienced still persist.

Regardless, the front-app-7c59885f69-vbwhp-devspace pod IS running and I can connect to it with kubectl logs -f front-app-7c59885f69-vbwhp-devspace, and hit the app via the external IP on a web browser and exercise its endpoints which shows up in the logs:

2021-05-25T19:42:54: PM2 log: Launching in no daemon mode
2021-05-25T19:42:54: PM2 log: [Watch] Start watching Service-Mesh-Test-App-Frontend
2021-05-25T19:42:54: PM2 log: App [Service-Mesh-Test-App-Frontend:0] starting in -cluster mode-
2021-05-25T19:42:54: PM2 log: App [Service-Mesh-Test-App-Frontend:0] online
Server is listening on port 80
front-app: Received request /good
front-app: Chaining to http://back-app-2/good

Running kubectl get deployment shows that the front-app Deployment is not aware of the devspace pod:

NAME         READY   UP-TO-DATE   AVAILABLE   AGE
back-app-1   1/1     1            1           18m
back-app-2   1/1     1            1           18m
front-app    0/0     0            0           18m

Other commands like devspace enter ,devspace sync and devspace ui work like a charm:

Running devspace purge appears to indicate that everything worked as expected:

[info]   Using namespace 'default'
[info]   Using kube context 'gke_myorg_us-west1-b_smt-istio'
[info]   Execute 'helm delete service-mesh-test --namespace default --kube-context gke_myorg_us-west1-b_smt-istio'
[done] √ Successfully deleted deployment service-mesh-test

However, running kubectl get pod shows the front-app still running:

NAME                                  READY   STATUS    RESTARTS   AGE
front-app-7c59885f69-vbwhp-devspace   2/2     Running   1          32m

Logically, it's inaccessible as everything else, including Deployments, Gateways and VirtualServices have been cleaned up.

What did you expect to happen instead?

For the image to build correctly the first time/not receive an error when specifying correct sync directory
To not get ejected from the shell session
For the deployment to map correctly to the replaced pod
For devspace purge to delete the replaced pod

How can we reproduce the bug? (as minimally and precisely as possible)

I have prepared a small repro repo here: https://github.com/wayspurrchen/devspaces-istio-repro

For convenience, this is the compiled kubernetes output that Helm applies, generated with helm template --debug helm/smt-app:

---
# Source: smt-app/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: front-app-config
data:
  OTHER_APP_HOSTNAMES: back-app-1,back-app-2
  APP_NAME: front-app
---
# Source: smt-app/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: back-app-1-config
data:
  OTHER_APP_HOSTNAMES: back-app-2
  APP_NAME: back-app-1
---
# Source: smt-app/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: back-app-2-config
data:
  OTHER_APP_HOSTNAMES: back-app-2
  APP_NAME: back-app-2
---
# Source: smt-app/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: front-app
  labels:
    app: front-app
    release: RELEASE-NAME
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: front-app
    release: RELEASE-NAME
---
# Source: smt-app/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: back-app-1
  labels:
    app: back-app-1
    release: RELEASE-NAME
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: back-app-1
    release: RELEASE-NAME
---
# Source: smt-app/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: back-app-2
  labels:
    app: back-app-2
    release: RELEASE-NAME
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: back-app-2
    release: RELEASE-NAME
---
# Source: smt-app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: front-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: front-app
      release: RELEASE-NAME
  template:
    metadata:
      labels:
        app: front-app
        release: RELEASE-NAME
    spec:
      containers:
        - name: front-app
          image: <any image, should not matter>
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          envFrom:
          - configMapRef:
              name: front-app-config
---
# Source: smt-app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: back-app-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: back-app-1
      release: RELEASE-NAME
  template:
    metadata:
      labels:
        app: back-app-1
        release: RELEASE-NAME
    spec:
      containers:
        - name: back-app-1
          image: <any image, should not matter>
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          envFrom:
          - configMapRef:
              name: back-app-1-config
---
# Source: smt-app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: back-app-2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: back-app-2
      release: RELEASE-NAME
  template:
    metadata:
      labels:
        app: back-app-2
        release: RELEASE-NAME
    spec:
      containers:
        - name: back-app-2
          image: <any image, should not matter>
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          envFrom:
          - configMapRef:
              name: back-app-2-config
---
# Source: smt-app/templates/istio-gateway.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: smt-gateway
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
# Source: smt-app/templates/istio-gateway.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: smt
spec:
  hosts:
  - "*"
  gateways:
  - smt-gateway
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: front-app
        port:
          number: 80

Commands

Local Environment:

DevSpace Version: 5.13.0
Operating System: mac
Deployment method: helm

Kubernetes Cluster:

Cloud Provider: google
Kubernetes Version: v1.18.17-gke.700

Anything else we need to know?

Running devspace sync from my project directory appears to not utilize the devspace.yaml sync settings, resulting in all of the files in app getting remote-synced into my local top-level directory. Running it with a command flag or moving to the app directory is easy, but it would be nice if the command automatically used the configuration file to know where to sync to.

Apologies for the exhaustive detail and thank you very much for what seems like a very useful and powerful tool once I figure out how to use it properly :)

/kind bug

wayspurrchen commented 3 years ago

Small update: after removing the terminal config and using the dev.sync.localSubPath, I now correctly get through to the log streaming with devspace dev but still experience the same issues with the 0/0 front-app Deployment and the last pod not getting deleted with devspace purge.

FabianKramm commented 3 years ago

@wayspurrchen thanks for creating this issue! The -devspace pod you are seeing is the result of using dev.replacePods which will scale down a deployment / replicaset / statefulset and create a custom pod that used for development instead which mirrors the settings and labels of the original pod from the deployment / replicaset / statefulset. You can checkout replacePods docs for more in depth explanation.

I agree that devspace purge should delete those pods as well, we can add that, currently devspace reset pods is required to delete those. devspace reset pods would also reset the deployment if you run it without devspace purge. In general, you don't have to use the dev.replacePods option, you can also directly target the pods from the deployment via the sync or portforwarding, thats up to you, however we think it simplifies certain workflows.

wayspurrchen commented 3 years ago

Interesting, that's good to know! I was under the impression that I had to use replacePods to get syncing/etc. capabilities but great to hear that's not the case. Testing this same workflow without replacePods works exactly as expected. Thank you very much!

devspace-sh / devspace

`devspace dev` replaces pod with `-devspace` suffixed pod but does not appear to update deployment and replaced pod will not delete on `devspace purge` #1448