Last backup failed: final backup failed Error on Self-Hosted 2022.09rc4 when manually stopping workspace

lucasvaltl commented 2 years ago

Bug description

When manually stopping a workspace, I recieved an error that the "last backup failed: final backup failed." This happened after I had the workspace running for some time (about one hour), and then extended the timeout to 3h and then stopped the workspace

Environment:

Gitpod Self-Hosted version 2022.09rc4 (I believe, could be rc3)
Internally launched preview environment on GCP using our scripts (this one, although the link might be oudated if the environment is recreated)

Steps to reproduce

Open a workspace in an environment launched by our internal pipeline on the same version as above. Wait for a while, then extend the timeout to 3h (not sure if this is relevant, but this is what I did), then stop the workspace manually from within the VS Code Desktop UI

Workspace affected

No response

Expected behavior

No response

Example repository

No response

Anything else?

No response

atduarte commented 2 years ago

@lucasvaltl I know it will take a while, but could you check whether it's reproducible? Also, the logs you mentioned in Slack would help for sure 🙏

lucasvaltl commented 2 years ago

Working on getting the logs. Fwiw, the workspace id in question was: gitpodio-website-qots97tlz3s

mrsimonemms commented 2 years ago

The support bundle (with all the logs etc) is uploaded to our Vendor portal

The config.yaml is:

apiVersion: v1
authProviders: []
blockNewUsers:
  enabled: false
  passlist: []
certificate:
  kind: secret
  name: https-certificates
containerRegistry:
  external:
    certificate:
      kind: secret
      name: container-registry
    url: gcr.io/sh-automated-tests
  inCluster: false
  privateBaseImageAllowList: []
database:
  inCluster: true
disableDefinitelyGp: true
domain: xxxx
httpProxy:
  kind: secret
  name: http-proxy-settings
kind: Full
license:
  kind: secret
  name: gitpod-license
metadata:
  region: local
  shortname: default
objectStorage:
  inCluster: true
  resources:
    requests:
      memory: 2Gi
observability:
  logLevel: info
openVSX:
  url: https://open-vsx.org
repository: eu.gcr.io/gitpod-core-dev/build
sshGatewayHostKey:
  kind: secret
  name: ssh-gateway-host-key
workspace:
  maxLifetime: 36h0m0s
  pvc:
    size: 30Gi
    snapshotClass: ""
    storageClass: ""
  resources:
    requests:
      cpu: "1"
      memory: 2Gi
  runtime:
    containerdRuntimeDir: /run/containerd/io.containerd.runtime.v2.task/k8s.io
    containerdSocket: /run/containerd/containerd.sock
    fsShiftMethod: shiftfs

lucasvaltl commented 2 years ago

I can confirm that this error is reproducible, at least in the environment at hand. I've tried to shut down three workspaces and all of them ended in the error state mentioned in this issue.

lucasvaltl commented 2 years ago

other environments (Azure and AWS) are unaffected

lucasvaltl commented 2 years ago

I can confirm that this issue does NOT happen on a GCP environment I spun up myself using terraform. From that, we can conclude that this is specific to the GCP environment I was using and likely an issue with the environment rather than Gitpod.

atduarte commented 2 years ago

Thank you @lucasvaltl!

gitpod-io / gitpod