backstage / backstage

Backstage is an open framework for building developer portals
https://backstage.io/
Apache License 2.0
28.2k stars 5.99k forks source link

Scaffolder: Decouple Scaffolder Workers #3746

Closed benjdlambert closed 3 years ago

benjdlambert commented 3 years ago

Add the ability to produce your own worker so that it can run wherever you like.

TBD.

mfrinnstrom commented 3 years ago

We are interested in this. We currently have our own plugin for running tasks in AWS ECS using Fargate but it would be nice to have this included out of the box.

SudoBrendan commented 3 years ago

I really like this idea, glad to see it here!

My 2 cents - it'd be awesome if all the backend server did was push a structured message to a queue/database and left the rest of the task to something else. This would mean that unique preparers, templaters, and publishers could be their own independent microservices and make it easy to independently implement/update/scale them - all they do is watch the queue for a message that meets their requirements, and each microservice only has the dependencies it needs to do its single task. This makes implementation extensible too - these microservices can actually perform these tasks (e.g. the microservice pretty much does exactly what's implemented r/n, like run cookiecutter) or they can act as a proxy to a third party (say, if you've got a Jenkins server and just want your scaffolding to be the result of a build... alternatively, you might kick off a Kubernetes Job with a specific container, etc). In such an ecosystem, I'm not sure if it'd make more sense to bundle preparers/templaters/publishers together or keep them separate, considering they currently rely on a shared filesystem structured in a unique way by the time they're invoked.

freben commented 3 years ago

My take is that it's a better complexity / flexibility balance to bundle them. It's an extremely low volume service, and management overhead will probably quickly overtake any value of separating them. But yes, this is the exact design I had in mind personally with a lightweight broker and workers that are basically regular little services that are offered and work off these tasks, sending feesmdback through the same broker as they do so. Should be easy to build without needing to depend on a work queue product or similar; just the regular backing database will do.

tragiclifestories commented 3 years ago

So now that we have an experimental implementation of the broker and worker, I guess the job here is to wrap the TaskWorker so it can work as its own process?

benjdlambert commented 3 years ago

Probably, or another way to run each action on a different agent for instance. So we could run cookiecutter in a k8s job that has cookiecutter in it for instance. Some way of communicating between the server and the agent. Needs a little more thought I think.

jluk-box commented 3 years ago

Hi, currently this is the topic I need to focus on. I already POC'd everything there is to do with Scaffolder and when I'm trying to deploy it to a serious environment the requirements regarding local filesystem locations of the backend process mounted to local containers are too restrictive (for k8s for example). I could start writing my own actions and steps but I think it would be beneficial for everybody to have some brainstorming and come up with a more universal solution :)

So I have similar ideas as @SudoBrendan - we could make the execution as much decoupled from Backstage as possible, publish events and/or RPC requests to a queue. This will give the possibility to use default steps bundled with BS (like @freben said) and also give others an interface to integrate their own executors. And we should delegate the job to tools that are already good at doing this, like StackStorm, Camunda or Argo (think progress tracking, error handling, retries, delivery guarantees etc.). In my use-case Scaffolder is not only a templater for source code repo, I want to use it also as a service onboarding wizard, with other steps like environment pre-provisioning, firewall configuration etc.

What would be a good way to gather interested people and discuss this?

regicsolutions commented 3 years ago

celery may be a good option on another pod with cookiecutter given the python requirement, this is how airflow does it - https://medium.com/sicara/using-airflow-with-celery-workers-54cb5212d405

stirno commented 3 years ago

After going a few rounds on this for my companies internal portal tool, I think in practice having a single container capable of executing many task types running as a k8s job/Azure Container Instance/Google CloudRun/etc would be ideal. Similar in most ways to Hosted Agents in Azure DevOps or the runners for GitHub Actions/other CI/CD platforms.

benjdlambert commented 3 years ago

Just coming back round to this to summarize as it was posted in the https://github.com/backstage/community/issues/14 Community Ticket.

So to summarize this ticket I think there's two different things right. There's the orchestration part, think like running these jobs somewhere, like on K8s Job or Tekton Template or a Jenkins job for instance, and then there's having the ability to be able to run these actions that you provide on different hosts that connect to a Queue and listen for work that they need to do.

Short of it is that we haven't looked to much into this yet, and it's probably not going to be something we look at in the short term right now.

I'm thinking that right now in it current state it's still unclear what we want it to do it and how it's going to look.

I propose that we should move this ticket to an RFC to come up with some ideas about how we can open up the orchestration and the agent picking or something?

dhenneke commented 3 years ago

Some updates about what we did in the past weeks. We focused on the "[...] running these jobs somewhere [...]" part of the problem and didn't touched the scaffolder task orchestration itself.

We opened up the runContainer(…) function in #5415 to be able to exchange it with something that is not plain Docker. We selected Argo Workflows as our execution engine that runs our "containers" in Kubernetes. Since we don't have a shared file system in this situation (though it wouldn't be impossible), we transfer inputs/outputs of each step via an S3 bucket.

We used TechDocs internally to draft the concept, so I can share some nice graphics. This is how it looks like at a high level (the interactions with a red * are part of our custom ArgoContainerRunner#runContainer implementation):

Getting the data to the cookiecutter container works as follows:

 

Conclusion: This is totally doable. We rolled out a first version today but still need to collect some experience about the stability of it. And the implementation is quite specific to our infrastructure (AWS S3 in EKS without much config options). So no code to be shared today :). We'll see if we put it into a new argo-container-runner package in the future or just add it to the contrib/ folder for inspiration to others.

At last, this is how it looks when we run a template (in time lapse mode; with some custom actions):

https://user-images.githubusercontent.com/720821/118683336-989f8000-b801-11eb-8549-dd5d7402efdc.mov

tragiclifestories commented 3 years ago

Very nice. So are you running all the steps in one Argo workflow or generating a fresh one for each step? I guess doing the former would mean taking on the orchestration piece too ...

dhenneke commented 3 years ago

The latter. It really boils down to transforming the intent of wanting to run a container into creating a short-lived Argo workflow. This way we still make use of all the original scaffolder features. We haven't thought about transforming a full template into a workflow or just triggering an already defined more complex Argo workflow that only uses the input/output of the scaffolder. Which is also an interesting idea 🤔.

Apart from this, it would also be interesting for us to work out whether the scaffolder could power more advanced workflows. Like maybe also be able to pause execution and wait for e.g. PR approvals that block the way to get something deployed into production in conformance with organizational guidelines.

tragiclifestories commented 3 years ago

Like maybe also be able to pause execution and wait for e.g. PR approvals that block the way to get something deployed into production in conformance with organizational guidelines.

Yep, we've had things like that come up as well. But anyway, breaking out of the docker-in-docker problem is a massive step forward so kudos for that anuway! 🙌

freben commented 3 years ago

This is cool! My initial ideas on how this would work, was rather to make a little "worker SDK" that helped people make runners that could be deployed anywhere and in any manner they saw fit (a kind of "pull" model), that communicated with the central broker - rather than a central thing triggering work externally (a kind of "push" model). This is an interesting take.

stefanbuck commented 3 years ago

A few weeks ago I wrote a blog post about projects scaffolding using GitHub Actions and Repository templates: Repository Templates Meets GitHub Actions

It's written from a GitHub user point of view. However, at work we have our own CLI which automates the manual steps using the GitHub API. We haven't explored this in terms of Backstage, but I like the idea of delegating the project scaffolding compute onto GitHub Actions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kuangp commented 3 years ago

should this remain open?

benjdlambert commented 3 years ago

@kuangp there has been some work in flight over here which breaks apart some of this #7426

benjdlambert commented 3 years ago

Although it doesn't really close this ticket, it does open up the possibility to implement your own TaskBroker interface, which is in charge of claiming and dispatching tasks making this ticket possible with a bit of work

matteosilv commented 2 years ago

I would like to help on this.

We are struggling to run some workflows after template compilation and we don't like too much the idea of having to embed all required tools in the backstage container, nor to run dind to do so.

As a temporary solution I created a generic action to run a container, that so far is using dind, but since the interface is generic enough, I was thinking about implementing a KubernetesContainerRunner.

However in this discussion we are talking about something more...