Allow end-tasks, which will run when a workspace stops (timeout)

shaal commented 3 years ago

Similar to start-tasks I propose adding end-tasks. End-tasks will get triggered and run when a workspace gets stopped (timeout)

Why it's necessary:

In a "regular" (non-gitpod) setup on a local machine, working on projects that store content and changes in a database is fairly straight forward. I can pull the database once, switch between different branches, make changes in the database, and continue working on the project for many days.
In a Gitpod setup, I can pull the database once, and any changes I make in the database will persist through timeout and a restart of a workspace. The next day (or minute) I want to work on a different branch so I start a NEW workspace, and poof! the database is gone.

How it works / Why it's a good idea:

Once end-tasks are available, I can make sure that a machine that is about to be stopped, can run a command that will store the important information / database / docker-image in a safe place. And in the next time I open a new workspace, I can choose fetching that database I saved.
I am sure there are many more scenarios possible that will be helpful for people once end-tasks feature is available.

shaal commented 3 years ago

svenefftinge commented 3 years ago

hey @shaal, the feature makes absolute sense, trying to understand your particular use case better. It sounds like you try to do two things 1) persist in-memory state to disk before stop, so when I restart a workspace it has the same state again. 2) keep state across multiple workspaces.

I have no questions regarding 2). I wonder what the scope of that state is is that per user per project? Or is it per project? (in which case I wonder if it should not be part of the init tasks or checked into git).

I would like to understand how you intend to surface to users why they have the state they are in and how they can control it. What would a user do if they want a clean slate for some reason or have a different DB schema, because they are working on some branch that has migrations.

These issues are generally the reason why I'd recommend creating fresh workspaces per branch and don't share such data across different project states. Within a workspace, I want to keep the state, of course. So when I stop it and later start it again it definitely should have the same DB state without question.

chlbri commented 3 years ago

I think for commit, because if you delete accidentally a workspace, you cannot react your changes. So I want a default commit and push with timestamps or slug to save in a new branch.

shaal commented 3 years ago

I created an example of end-task that I want to use: https://github.com/shaal/DrupalPod/pull/18

When a workspace shuts down - .gitpod/aws-backup.sh is called (creates a binary mysql backup and stores it in AWS) When I open a new workspace where I want to use the previous' workspace's database - I'll run .gitpod/aws-restore.sh which will restore the latest backup for this branch. Alternatively, I can run .gitpod/aws-restore [name_of_backup] to restore a specific named backup I created on a separate workspace.

orellabac commented 3 years ago

This will great. It will allow me to do some simple telemetry or housekeeping tasks!

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shaal commented 3 years ago

Can we please add the label meta: never-stale to this issue?

chlbri commented 3 years ago

Ok

Envoyé de mon iPhone

Le 18 oct. 2021 à 19:49, Ofer Shaal @.***> a écrit :

Can we please add the label meta: never-stale to this issue?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

john-french commented 3 years ago

I'm looking for this feature too and ended up here. My use case is as follows:

I'd like to use Gitpod in teaching an introductory programming course. These beginner students have enough on their plates trying to learn to code without having to learn Git (for now), so I'd forsee providing a shell script in the workspace which does git add/commit/push for them without blowing their minds. If I could run this from end-tasks it would be wonderful, their code would always get pushed to Github automatically, where I could trigger unit tests etc, without having to even mention the words "staging area" to my students.

joepurdy commented 2 years ago

This would unblock me on a number of different issues I've worked around in my own ways.

For starters we currently network a set of dependent workspaces together via Tailscale as Ephemeral Nodes. The issue lies in the fact that despite being "ephemeral" Tailscale doesn't perform housekeeping until as long as 48 hours later. This leads to long forgotten/deleted Gitpod instances lingering and causing duplicate hostnames.

Tailscale's official recommendation when you need a consistent hostname is to instead remove the node via an API call:

If you're using hostnames to refer to things, and need to have the node deleted as part of your workflow, then you can make an API call from your automation system.

Here is the API spec for that

Other than that, the cleanup of ephemeral nodes is a bit lazy, but they shouldn't linger more than a day or two.

With an end task in Gitpod we could automate that API call to clean-up the tailscale node before shutting down.

On a different note our developers often are working with persistent data in MySQL and even though it does persist through workspace restarts thanks to storing the data under the /workspace/mysql path there are times where one developer wishes they could share the latest DB dump with someone else. We've crudely solved this by a script to dump the database to cloud storage and another script to restore from a named DB dump. Sometimes we have team members forget to run this though and they lose the latest copy of their data.

End tasks could help here as well by automatically calling the script to have an autosave backup just in case.

jetdream commented 2 years ago

Our projects heavily rely on cloud infrastructure for development.

I see this feature extremely useful for cleaning up the associated cloud infrastructure on start tasks we create AWS services instances using Terraform/Ansible - database, event bus, storage, stream processor etc on end tasks we would destroy those instances

jkaye2012 commented 2 years ago

We would like this feature as well. For us, the primary benefit would be cleaning up caches that are left around the workspace. This leads to workspaces taking a very long time to come back up (or, often, never coming back up at all). If we could hook an end task, automating a clean of the workspace would be trivial.

pawlean commented 2 years ago

Thank you all for your input. I noticed that this wasn't in a team inbox, @shaal so I've just put it in the WebApp team.

WebApp team - if this doesn't belong to your team, can you move it to the correct inbox? I thought this may be one of those that requires input from every eng team... Your call!

csweichel commented 2 years ago

In today's workspaces, prior to a regular shutdown all processes receive SIGTERM. They then have 15 seconds time before receiving SIGKILL. That's not quite as convenient as shutdown hooks, but helps to e.g. flush a DB to disk prior to shutdown.

jkaye2012 commented 2 years ago

Interesting, one way to simulate a generic shutdown hook then would be to implement a daemon that we run locally that sleeps until it receives the SIGTERM, then fires the shutdown logic, correct?

geropl commented 2 years ago

@akosyakov Pulling you in because this is more of supervisor territory; webapp would just provide/approve the additions to the config.

Does this seems reasonable to implement?

akosyakov commented 2 years ago

I find the terminology confusing, i.e start vs end tasks. I think there are just tasks (shell sessions) and then there should be some shutdown protocol for them, i.e. it could be something based on signals on top of https://github.com/gitpod-io/gitpod/issues/3966#issuecomment-1084201788 with specified time guarantees, like shutdown timeout. Maybe 15 seconds already enough by the way.

Pulling you in because this is more of supervisor territory; webapp would just provide/approve the additions to the config. Does this seems reasonable to implement?

I'm not sure whether should be an extension of .gitpod.yml.

jkaye2012 commented 2 years ago

From our perspective, this would be something that we would like in .gitpod.yml.

We have been able to successfully emulate this using signal handlers, but the result is a bit of a kludge as you end up with a script running for the entire lifetime of your pod just to catch a signal. It works well enough, could just be simpler to use is all.

shaal commented 2 years ago

The original thought behind the feature request was being able to run tasks before the workspace shuts down. Some tasks might take longer to run (ie. making a copy of the current database, and uploading it to the cloud).

It would be straight forward to define these tasks in .gitpod.yml in its own section (ie. pre-shutdown)

akosyakov commented 2 years ago

ok, makes sense, so you mean something like:

tasks: 
  - shutdown: sh ./scripts/shutdown.sh

Which is executed on SIGTERM event and has 15 seconds to complete?

axonasif commented 2 years ago

Which is executed on SIGTERM event and has 15 seconds to complete?

@akosyakov I think it would be better to have more than 15 seconds when done from .gitpod.yml 👀

svenefftinge commented 2 years ago

Adding it to tasks is problematic as those terminals might not even exist anymore or waiting for a command to return. Also, this would make the time when such a command is executed a little fuzzy (before all terminals stop or one by one?). So I think we should introduce a top-level hook for this:

tasks:
  - command: |
         start-db &

onWorkspaceStop: |
   stop-db
   sync-state.sh

The command would run after all terminals have been closed but before? the IDE has stopped. It should run in the same context as all other commands run (i.e. as gitpod user, in a shell, etc.).

shaal commented 2 years ago

@svenefftinge I like it!

loujaybee commented 2 years ago

Adding to IDE team sync next week to see if we can pick up the open PR and get it completed.

No promises on timeline, but we'll take a look!

Related internal thread.

csweichel commented 2 years ago

Relevant discussion: https://github.com/gitpod-io/gitpod/pull/11287#issuecomment-1190241322

loujaybee commented 2 years ago

Removing from IDE sync, as looks like @svenefftinge is looking into this ! 🙏 🚀

karpa commented 2 years ago

I read that some users had success by using SIGTERM. Can you post a link or some instructions on how you do it?

axonasif commented 2 years ago

Hey @karpa, until #11287 is deployed, you can use this snippet on your .gitpod.yml:

tasks:
  - name: Shutdown daemon
    command: |
      function shutdown() {
        # Do stuff here, for example
        docker-compose stop;
      }

      trap 'shutdown; exit' SIGTERM;
      printf '\033[3J\033c\033[3J%s\n' 'Waiting for SIGTERM ...';
      exec {sfd}<> <(:);
      until read -t 3600 -u $sfd; do continue; done;

akosyakov commented 2 years ago

@karpa here is an example of how one can set it up till we have shutdown commands: https://github.com/akosyakov/gitpodify-docker-compose/blob/docker-compose_check/.gitpod.yml

geropl commented 2 years ago

@svenefftinge on holidays, so took it out of progress for now.

karpa commented 2 years ago

I used the code from both of you @akosyakov and @axonasif and adapted it to solve my problems. Thanks.

loujaybee commented 1 year ago

Bumping this one again, as it has come up again in conversations with integration partners. Having this feature would be incredibly powerful for showcasing infrastructure spin up and tear down with Gitpod, especially in the context of ephemeral cloud infra environments as a natural extension of Gitpod. Again, let's see who can pick up the draft PR and get it over the line 🙏

loujaybee commented 1 year ago

Linking this docs effort for workspace lifecycle updates:

https://github.com/gitpod-io/website/pull/2858

Opportunity to document shutdown behaviours and hooks for users to listen to.

akosyakov commented 1 year ago

I wonder whether anyone try to use existing Linux solution instead like init.d services. Here is a setup which start docker-compose daemon in a usual way [1].

There is no need for end tasks then, and users can port existing init.d service scripts to Gitpod, the normal command tasks is used to call out to start such service:

tasks:
   - command: service docker-compose start

faermanj commented 1 year ago

It would be great to be able to tear down cloud resources (k8s clusters, s3 buckets, ...) when environments are deleted. Currently we have to "prune" such resources manually, which is error prone and wasteful.

akosyakov commented 1 year ago

@faermanj Have you already tried to create a Gitpod task which on SIGTERM does it? All processes will receive SIGTERM signal on the workspace shutdown and have 15 seconds by default to gracefully terminate. You can also use init.d to perform it [1] in the background.

gitpod-io / gitpod

Allow end-tasks, which will run when a workspace stops (timeout) #3966

Why it's necessary:

How it works / Why it's a good idea: