gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.71k stars 1.21k forks source link

Epic: Ensure durability for user workspace files #7901

Closed kylos101 closed 4 days ago

kylos101 commented 2 years ago

Summary

Better protect user data

Context

Sometimes a workspace, node, or workspace cluster fail and the user data cannot be backed up to cloud storage, resulting in data loss. A related incident for a global outage. A related RFC where we are discussing solutions.

Value

By better handling user data, users will trust that even if the Gitpod service is unavailable, once it is online, they will not lose data.

Acceptance criteria

User data is persisted in such a way that even if there is a workspace, node, or cluster failure, the data is accessible to be backed up at a later time.

Tasks

Ops:

Design:

Product changes:

Tests:

Bug

Should solve:

Day 2:

Front logo Front conversations

atduarte commented 2 years ago

@kylos101 Few questions related to "users must be able to access their most recent backup for a workspace regardless of workspace status": 1. During the stopping state, would the system be able to distinguish a backup that was done as a result of it from a previous one? 2. From what I understand/recall we store the last 4 backups. Would we be able to provide the WebApp with the links and corresponding timestamps of all of them?

aledbf commented 2 years ago

automate deployment of GCP storageClasses as part of cluster creation operation (specify discard mount option)

this is not required for XFS.

aledbf commented 2 years ago

installer: allow to specify storageClass in gitpod.yaml

this can be optional for the first iteration

kylos101 commented 2 years ago

@sagor999 as a heads up, I added a few observability tasks. One of the first ones we'll need (if it doesn't already exist) is the ability to inspect backups and restores now being done with TAR. For example, this way we can measure duration for both.

kylos101 commented 2 years ago

@sagor999 @jenting are there any more integration tests that need to be added for new code we've written? In other words, I see you've fixed existing tests, but wanted to double check for new test needs. For example, one test I can think of, would be a test that kills a pod, relies on a process to backup the orphaned PVC, and then assert that the PVC is gone (because it was snapshotted).

sagor999 commented 2 years ago

Currently has this issue affecting PVC epic: Sometimes workspace attempts to start with PVC feature enabled

axonasif commented 1 year ago

Question: How would someone who ran out of hours get their data back? (re: https://github.com/gitpod-io/gitpod/pull/14393) Contact support? It'd be better if they could self-serve.

SNWCreations commented 1 year ago

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?) Sometimes, I need to update my artifact on another server by downloading the artifact from Gitpod server and upload it to my server manually.

svenefftinge commented 1 year ago

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?)

No, this is about downloading the workspace content backup. You can still download individual files from your running workspace depending on how you connect to it. E.g. with Vs Code, just drag and drop.

6uliver commented 1 year ago

Maybe this issue should be part of this epic to not lose my workspace's content on a regular basis: https://github.com/gitpod-io/gitpod/issues/11183

atduarte commented 1 year ago

Update: Blocker functional issues, and significantly increased workspace startup times were found on the current technical design. 😞

After internal discussions, given backup success ratio is high and stable following adjacent improvements, and that the implementation of the new design will be considerably faster to do after https://github.com/gitpod-io/gitpod/issues/11416, we have decided to pause this effort until then.

PS: @6uliver I believe the root cause of that issue is different from the context of this one. I will follow-up on that one there. 🙏

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.