Closed benjdlambert closed 3 years ago
@spotify/backstage-core @spotify/techdocs-core i'd be interested on your thoughts. This is what i have so far from various meetings and some thinking. API is not final, definitely going to change with some recommendations here.
Really interesting direction for the scaffolder and I think it would solve most of our use-cases. Some comments/questions from me:
But as I said this would make it quite easy for us to add our company specific actions as steps in the templates. I had a question about events in the previous thread and if that doesn't become a core thing for Backstage we could add that ourselves using this flow.
Ok - a lot to run through here @mfrinnstrom thanks for the comments!
What about the input schema that we could specify before and then have the UI prompt the user for?
So off the bat, what I think we would head down the route of its being able to define the json-schema
inline for now for each step and then collect them all into the main form. Condensing the duplicate fields, and maybe have a value map too so that you can map input values from other parts of the form to different values in the steps.
Something like this:
apiVersion: backstage.io/v1alpha1
kind: Template
metadata:
name: react-ssr-template
title: React SSR Template
description: Next.js application skeleton for creating isomorphic web applications.
spec:
steps:
- name: Fetch React SSR Template Source
fetch: "github:https://github.com/spotify/backstage/path/to/template/source"
ref: $SSR_SOURCE
- name: Fetch mk-docs Template Source
fetch: "github:https://github.com/spotify/techdocs-core-template"
ref: $TECHDOCS_SOURCE
- name: Template core template
invoke: cookiecutter
schema:
required:
- component_id
- description
properties:
component_id:
title: Name
type: string
description: Unique name of the component
description:
title: Description
type: string
description: Help others understand what this website is for.
args:
input: $SSR_SOURCE
output: .
- name: Add techdocs
invoke: cookiecutter
schema:
required:
- docs_name
properties:
docs_name:
title: Name for Documentation
type: string
description: Name for the documentation
description:
title: Description
type: string
description: Help others understand what this website is for.
args:
input: $TECHDOCS_SOURCE
output: .
- name: Publish to Github
invoke: "publish:github"
Or alternatively, we could define them at the root level like they are today, but then that causes some problems when we want to move the actually split apart the templating parts into re-usable steps it's hard to do that from when it's all defined at the top level.
If we only want to fetch files from a repo and not invoke a templater on them. Would that mean invoking a "move function" in this case or could we perhaps specify a path in the fetch step?
So I was thinking that the ref
variable would be available in the context. You could either call out to a mv
function in a node function, or maybe we could add another step method called sh
and you could do something like this?
steps:
- name: Fetch React SSR Template Source
fetch: "github:https://github.com/spotify/backstage/path/to/template/source"
ref: $SSR_SOURCE
- name: Move to current directory
sh: 'mv $SSR_SOURCE ./something'
Also not sure if just providing another option to the fetch
step would also work too that you could specify the directory to move it into rather than tmp
for example?
Could there be a way for an invoke step to fetch information from an external source and make that available in the context as well to have that available to the following steps? This would allow us to fetch information (AWS account ids perhaps) and use them later on in the template.
Yeah I still think that we need the ability to be able to return things from these functions that also get added as context, not sure yet how that would change from the current usage, but thinking that it could work as it does today in the jobProcessor
While I appreciate the flexibility of the proposed model, I'm wondering if we could get away with something a bit more constrained if we split creating repositories from composable software templates? As I haven't dug into the use-cases or the current implementation too much, I might also be missing some obvious thing but was thinking something along those lines:
This would essentially flatten every template to just have one step. It would also mean that the templates can't refer to the working directory or to other templates running at the same time, but that might be a feature? Otherwise it might be hard to tell which templates are applicable where, in what order, with what inputs etc.
Of course, those two steps could still be combined in the UX. Step 1: Select an existing repo or create a new one. Step 2: Select one or several templates to scaffold into the repo. Step 3: Click the link to go to the generated PR, review and merge it.
@benjdlambert thanks for the clarifications! One thing I'm not sure about though.
Or alternatively, we could define them at the root level like they are today, but then that causes some problems when we want to move the actually split apart the templating parts into re-usable steps it's hard to do that from when it's all defined at the top level.
Would this mean you can reference one template from another one or how would this look? My thinking was that I didn't have to register all the repos that I use in a template in Backstage but I guess I'm missing something here.
Regarding the fetching of things without running a templater on them I meant something like this.
steps:
- name: Fetch React SSR Template Source
fetch: "github:https://github.com/spotify/backstage/path/to/template/source"
path: ./something
I probably won't need to reference that then. Maybe fetching could just be a function to invoke as well?
@mfrinnstrom I think it's better practice that the steps context only include the form values that are presented in the schema
but maybe special keys like ref
's could be passed through.
fetch
will go and grab the contents and extract it somewhere, the you could set the ref
as $SOME_SOURCE
then the context would have some way to reference this path somehow.
And then the $SOME_SOURCE
variable can be used in the yaml
.
fetch
could be just an invoke, but it could be a pain to manage the authorization in these functions too where as we could take care of it as a first class citizen like the Service Catalog does.
I probably won't need to reference that then. Maybe fetching could just be a function to invoke as well?
I can still see a use case where you might want to reference that path later on for some reason. I'm saying it's better to have the choice?
@mbruggmann I'm thinking that the current implementation that we have is very similar to what your proposed solution is. You can define template which has some skeleton and a templater, and then the publish step is statically defined right now from the frontend. It works pretty well, but there's been an increase in requests for flexibility and more customisation, so I think inverting the control back to the end-user than trying to be opinionated is a better option.
Purely because I think there is no one size fits all for every company.
I am firmly in the camp of that it should be at least somewhat opinionated, and it shouldn't just become a new Github Actions even though it looks similar, it should just be for Creating or Updating a repository at the end but where that lies is up to the implementer.
This would essentially flatten every template to just have one step.
This has also come up that it can be really hard to see what failed and why, and it would be a better user experience to break down these things into clearer sections in the frontend. I think it could be pretty hard to do that without having these explicit steps.
Running on existing repositories, or code that exists and creating PR's for the changes
This is one of the more compelling ideas for us in this RFC. We already have a large established set of repos so making changes across them would be very useful. Especially if it was possible to do as a bulk change across multiple repos.
As an example, if we wanted to increase the entity schema version across all of the catalog-info.yaml files with each repo. Another might be adding a docs-like-code setup for existing repos.
@andrewthauer A constraint we put in place for now is to only consider user-initiated flows, i.e. no dependabot-type things.
Still very much possible that we can support what you suggest though, but bulk changes across repos will require a bit of creature feep.
@mbruggmann Splitting things out into a PR-driven flow makes it really flexible, but I do worry a bit about the DX of having to sit and click through 5 different templates and creation steps. My big concerns is where that puts the flexibility though, I think we're searching for a solution that provides flexibility for the integrators and template creators, i.e. the people that are running an organization's instance of Backstage. We're not looking for flexibility for the end user, with the exception of some knobs that are deliberately put in place by the template creators.
@benjdlambert Regarding the schema I'm kinda leaning towards having the yaml structure decide how to run different steps with different input/output. Probably a recurse schema with some limitations. What are your thoughts on something like this?
spec:
schema:
properties:
monitoring:
title: Monitoring Bundle
type: boolean
description: Checking this will also create a Grafana dashboard for your service
steps:
- name: React SSR Template Source
# The output of a template step is merged into the outer working directory
template:
- name: Fetch React SSR Template
fetch: github.com/spotify/backstage/path/to/template/source
- name: Run Cookiecutter
invoke: cookiecutter
schema:
properties:
name:
title: Name
type: string
required: true
description: Unique name of the component
description:
title: Description
type: string
required: true
description: Help others understand what this website is for.
# Adds TechDocs with a suggested documentation structure, but no additional templating
- name: TechDocs Template Source
template:
- name: Fetch TechDocs Template Source
fetch: github.com/spotify/techdocs-core-tempalate
- name: Setup Monitoring
invoke: "monitoring" # Custom invoke added by the org
if: '.monitoring' # Some basic conditionals mapped from user input
- name: Publish to Github
invoke: "publish:github"
schema:
properties:
repoSlug:
title: Repo Slug
type: string
required: true
component: GithubRepoSlug # Mapped to a custom input component on the frontend
description: The GitHub <org>/<repo> for this component.
Top-level properties of each step in this format:
@Rugvip I like this.
From what I remember, the required
part of json schema however is a little different to what you have listed here.
It might like something like the following:
spec:
schema:
properties:
monitoring:
title: Monitoring Bundle
type: boolean
description: Checking this will also create a Grafana dashboard for your service
steps:
- name: React SSR Template Source
# The output of a template step is merged into the outer working directory
template:
- name: Fetch React SSR Template
fetch: github.com/spotify/backstage/path/to/template/source
- name: Run Cookiecutter
invoke: cookiecutter
schema:
required: ['name', 'description']
properties:
name:
title: Name
type: string
description: Unique name of the component
description:
title: Description
type: string
description: Help others understand what this website is for.
# Adds TechDocs with a suggested documentation structure, but no additional templating
- name: TechDocs Template Source
template:
- name: Fetch TechDocs Template Source
fetch: github.com/spotify/techdocs-core-tempalate
- name: Setup Monitoring
invoke: "monitoring" # Custom invoke added by the org
if: '.monitoring' # Some basic conditionals mapped from user input
- name: Publish to Github
invoke: "publish:github"
schema:
required: ['repoSlug']
properties:
repoSlug:
title: Repo Slug
type: string
component: GithubRepoSlug # Mapped to a custom input componen
@Rugvip how would we pass variables into the different invoke
parts?
Some comments/questions from me.
name
property in many places but for some reason they end up with different titles or descriptions.repoSlug
property is decided by the publish:github
function and we can't really change that right? What if another function decides to call it repo_slug
? We can't change any of them and we end up with both of them prompted to the user. I guess this ties back to @benjdlambert question about passing variables into the invoke
parts.$.monitoring
instead of just .monitoring
(or perhaps just monitoring
).component
part of the publish to GitHub step made it clear to me where we would extend this to support our custom selections and pre-filled values. This is something I imagine won't change much between templates so I thought maybe this could be configured when registering the publish:github
function in the backend but then I don't see a clear way to reference that in the templates.component
part so that that gets handled up front and the user can actually see the value before proceeding.Our Backstage app will be running using ECS Fargate on AWS. I'm now looking into running the templater step of the scaffolder using ECS Fargate as well instead of a local Docker instance. To make it easy to share the prepared files (that are downloaded in the Backstage container) with the templater step that will run in a separate container we intend to use EFS and mount a shared filesystem between the containers.
@mfrinnstrom it would be really interesting if you helped write down how to get Backstage up and running on Fargate, so that other can follow the same model. Maybe in the form of a tutorial? Example https://backstage.io/docs/tutorials/quickstart-app-auth
First, @stefanalund I'm sorry for missing this. I will see if I can extract at least the CloudFormation template that we have for testing right now (it won't be production ready) and some short description on how to use that.
@benjdlambert & @Rugvip I guess you are discussing this internally but we had some discussions around this on our end yesterday and I thought I should at least try and document our thoughts.
I still like the idea of a clear "contract" for this specific template at the top but then you will have to map those to each function invocation but I think this is something that needs to be supported anyway.
In the example below everything is a function that you invoke, there are no built-ins. I, as an operator, would have to configure all the functions that I would like to have available in my instance. This is similar to how it is today where we have to wire up the scaffolder with the templaters, preparers and publishers that we want to have.
Each function would have the same interface, it takes an JSON object as input and returns an JSON object as output. There will of course be different required values for each function. To make it known during runtime what functions (and their inputs and outputs) are available I'm thinking something similar to what has been done with the config schemas. This could perhaps be shown in a Help page for the scaffolder. I guess there could be some sort of validation functionality for the templates as well against this. This is something that we would want to run during our CI/CD pipeline for the templates then if possible.
We also introduced the id
for each step to be able to reference the output of a previous step and use that as parameters for another step.
spec:
parameters:
dataSchema:
type: object
required: ['name', 'description', 'repoSlug']
properties:
name:
title: Name
description: Unique name of the component
type: string
description:
title: Description
description: Help others understand what this website is for.
type: string
repoSlug:
title: Repo Slug
type: string
component: GithubRepoSlug # Mapped to a custom input component
monitoring:
title: Monitoring Bundle
description: Checking this will also create a Grafana dashboard for your service
type: boolean
default: true
uiSchema: # If there is a need to do some custom config of the UI
steps:
- id: network
name: Lookup network config
invoke: custom:networkLookup # Should we namespace functions? Maybe not if there should be no built-ins
- id: template
name: Some infrastructure template
invoke: template:cookiecutter
parameters:
source: github.com/spotify/backstage/path/to/template/source
destinationPath: ./
variables:
name: $.parameters.name # Maybe we could use JMESPath or something else
description: $.parameters.description
cidrBlock: $.network.cidr # References the previous step using its id
# Adds TechDocs with a suggested documentation structure, but no additional templating
- id: fetch-docs
name: TechDocs Template Source
invoke: backstage:fetch
parameters:
url: github.com/spotify/techdocs-core-template
destinationPath: docs/
- id: monitoring
name: Setup Monitoring
invoke: custom:monitoring # Custom invoke added by the organization
if: '$.parameters.monitoring' # Some basic conditionals mapped from user input. Maybe name this `condition`?
- id: publish
name: Publish to Github
invoke: publish:github
parameters:
source: ./ # If you want to publish only a part of the working directory
repo_slug: $.parameters.repoSlug
So that more or less sums up our thoughts right now π
Observed Issue: When we have two instances of the backend, frontend requests to check the status of a running scaffolder job (via the GET /v1/jobs/
Potential solution: This could be solved by persisting the state of a job in the DB
@errolpais Yep, definitely needs a fix and is in scope!
@mfrinnstrom Interesting, thanks for writing it up so thoroughly. I wonder if template "functions" could be catalogued too, registered as a TemplateStep
kind with an according schema
for its inputs and outputs as you mention.
One drawback with the explicit parameter passing approach, combined with detaching template "workflows" from cookiecutter template repos, is that there will be a ton of mechanical parameter passing. If the cookiecutter template needs 20 parameters, you'll have to declare and pass them all in every workflow that uses it.
I also wonder how to best solve development and versioning. If you want to add or remove an input parameter in your cookiecutter template repo, how do you test and deploy that change together with updating and re-registering the different workflows that make use of it? π€
@freben Interesting idea with the TemplateStep
. I guess there will be some sort of source code connected to some of them that needs to be executed. Maybe you only mean to have them registered for the schema though?
I see your point about the parameters and I agree that it could be lots of parameter passing. I can see the beauty in having the schema for a step defined with the cookiecutter template and then not having to specify them for every template.
My concern though is when you have multiple of these steps in one template and one calls it component_id
, another component-id
and the last one repo-name
. In practice we want the same value for them but due to their different names the user will be prompted for three different values. I would at least be a little confused/irritated by that. Not sure if JSONForms
has a solution for this.
One possible solution for it could be aliasing of parameter names for specific step but then we are almost back to parameter passing. I'm not sure what would happen if different steps declare that a parameter should be handled by different components either.
Good point about versioning. My immediate thought is that a template could point to a git tag (or branch?) to have a stable reference. Then you can continue to develop in your master branch and when a new version is ready you tag it and then the templates can be updated when ready. I'm not sure if I'm totally sold on that solution though.
No you are right, I was thinking that the TemplateStep
could define (relevant parts of / references to) the actual implementation of the step in addition to the schema, and can be of several types. If you look at how GitHub actions are defined, they have three main types of action: docker, javascript and composite. I think for our scaffolder, we may want to initially support just one type - one that calls a function which the backend itself has to supply into a map that's given to the engine.
// in the backend package
const tmpdirV1: NativeFunction = ({ outputs }) => {
const path = await fs.mkdtemp(path.join(os.tmpdir(), 'scaffolder-');
outputs.set('path', path);
return async () => { await fs.rmdir(path); } // support for cleanup?
};
const readTreeV1: NativeFunction = ({ context, inputs, outputs }) => {
const { workDir } = context;
const { sourceUrl, targetPath = '' } = inputs;
const targetDir = path.join(workDir, targetPath);
// mkdirp, then use the existing UrlReader for readTree etc
};
const engine = new ScaffoldingEngine({ nativeFunctions: { tmpdirV1, readTreeV1 } });
const router = createRouter({ engine })
apiVersion: backstage.io/v1alpha1
kind: TemplateStep
metadata:
name: tmpdir-v1
spec:
type: native # Other types could be envisioned here - docker for example, or bash (run locally)
uses: tmpdirV1 # A function name, as given to the engine
# (or a docker image name, if this were a docker type step, etc)
outputs:
path:
description: The full path to a newly generated (unique, empty) temporary directory
apiVersion: backstage.io/v1alpha1
kind: TemplateStep
metadata:
name: readTree-v1
spec:
type: native
uses: readTreeV1
inputs:
sourceUrl:
required: true
description: The full URL of the root of the tree to read
targetPath:
required: true
description: The path to store the resulting tree in - either absolute (if a temp dir) or relative (to the workdir)
outputs:
path:
description: The full path to a newly generated (unique, empty) temporary directory
apiVersion: backstage.io/v1alpha1
kind: Template
metadata:
name: default-fetch-cookiecutter-repo-v1
spec:
inputs:
# ...
steps:
- name: temp
uses: tmpdir-v1 # This is actually an entity ref, amounts to templatestep:default/tmpdir-v1
- name: get
uses: readTree-v1
params:
sourceUrl: https://github.com/my/templates/the-template
targetDir: ${steps.temp.dir}
# ... etc
This is not a fully formed example, but it's what I had time to type out now :) I kinda like that it ends up using entity references.
When will the work on this be started, or should we have another RFC where we collect the findings from this RFC in one (or multiple) proposed solutions? π
Apologies for the delay for an update on this, but we've split out the work into a milestone here https://github.com/backstage/backstage/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Scaffolder+out+of+Alpha%22.
It's very high level implementation wise, and they've been broken out into some form of epic or focus area.
Evaluating this RFC and the ideas that come with it is one of our higher priority tasks for the new year, as It's something we're focusing on in Q1, so we will shortly move the discussion around the different areas from here, into the tickets in the milestone.
I'm going to close this RFC for now, and thank you all for the feedback so far. We'll be updating the tickets in the milestone with a little more detail over the coming weeks with our suggested path, would love to hear feedback there too.
I think the workflow approach is pretty interesting and powerful. Just for your info, the jhipster studio is capable of doing a lot of what you guys are asking for (lifecycle, hooks, composable steps, extensions, git repo creation, etc), maybe you can get few tips from them.
On another note, I am one of the creators of the JHipster IDE plugin so I would willing to work on IDE tooling (editor, code highlighting, code completion) once the grammar is stable
@jbadeau jhipster studio looks supper interesting wondering if you played around with creating a scaffolder backend module like the yeoman module? https://github.com/backstage/backstage/tree/master/plugins/scaffolder-backend-module-yeoman
I have not had the time too look into the new backstage generator but the syntax looks pretty similar to workflows like tekton, Argo, etc. Reusable steps composed into a dag where each step provides a schema for ui generation and validation. Pretty cool.
I have not had the time too look into the new backstage generator but the syntax looks pretty similar to workflows like tekton, Argo, etc. Reusable steps composed into a dag where each step provides a schema for ui generation and validation. Pretty cool.
@jbadeau Any more thoughts on integrating JHipster Lite and Backstage? The combination of selecting an architecture and visually selecting the deployment infra would be powerful.
In short, it seems like there's a ton of overlap in the problem to be solved across Backstage Templates and JHipster Lite (e.g., auto-generation and auto-provisioning) IMHO, JHL has a better UI because it presents the dependency chain visually. The ideal would be to create a JHL toolkit that allows us to define an architecture template and its dependencies, then auto-generate a JHL like UI for the teams to use.
Status: Open for comments
Background
Hey :wave:
So as you might be aware, we released our MVP for the Scaffolder a few moons ago, and internally have been overwhelmed with the amount of contributions and interest it has received.
Naturally, we've seen a lot of requests for features, and have slowly been seeing an increase in use-cases, some that might not have been considered in the original architecture of the Scaffolder.
If you haven't already seen the existing RFC about Extending the Scaffolder I'd recommend you go and do that before commenting on this RFC.
Problem
Looking at the previous RFC, I read through and gathered some key features and behaviour that the current Scaffolder implementation might not be able to support.
jobProcessor
Although not our original plan to gather feedback around the scaffolder, thank you so much to the contributors that reached out in this RFC to list their requirements, ideas, and what they thought was missing. Your feedback is always welcome and helps us all on the journey to make a better product!
After I grouped the feedback into the above list, I think there's a way forward.
Opening up the ability to add more steps to the
jobProcessor
(3) would enable users to write custom steps that run after repository creation and pushing. This solves, in turn, the last three points. Of adding integrations, webhooks and setting up things after the repo has been created, (4, 5, 6)Another idea for improvement to the scaffolder, is the composition of code from one step to another. If we treat the
jobProcessor
as something similar to Github actions workflow, where it assumes nothing about the steps, just that you start with an empty work directory, and that same directory is shared throughout each step we can in turn solve the other points (1, 2).Imagine that Templates now become the definition of these steps. That you can then describe the steps explicitly in the .yaml, similar again to github actions.
These steps can decide to pull data from other sources, like existing repositories, or even run additional logic for templating
cookiecutter
templates as part of the workflow.Then you can start to compose things into the current directory and what you're left with is the end result.
<tldr />
Here's want we want to fix with the scaffolder.jobProcessor
and remove the fixed stepshttps://github.com/spotify/backstage/blob/55542797a9cbb156792774c502b44155c985fa2a/plugins/scaffolder-backend/src/service/router.ts#L97-L139
Solution
I think that the solution has many steps. I'm cautious of proposing an API change too early, that we haven't even proved yet. So this proposal is twofold.
First off, I think the first step is to decouple the template definition with the skeleton or source of the template. It's become clear that sometimes templates have more than one source, and it's become clear that templates are not re-usable when the source or skeleton is treated as the same thing.
I'm thinking that the templates that you have available to pick from, could be different variations of workflows rather than being tied to one source, one transform, one publish.
Some will just be simple of taking a source and republishing, some will template some source with
cookiecutter
, some might take data from multiple sources and using bothcookiecutter
andhandlebars
create one end result, it doesn't make sense that the Template provides only one source of data. You should have a single source of truth for source files (skeleton) that you can re-use between templates.Right now, the skeleton for something like a
cookiecutter
template must be co-located with the initial definition of theTemplate Definition
using thepath
key.So we could add a new field, which would reference the source required for the
cookiecutter
template that we're currently going to define. Something likeskeleton: github:https://github.com/spotify/some-template-here
. However, I also think that this is going to become an anti-pattern. I think that separating thesource
and thetype
from the initial definition is something that would become its own entity at a later stage.I'm proposing that we change the
Template
kind to become something that can solve our original 2 problems.steps
for thejobProcessor
in eachTemplate
definition.fetch
becomes a wrapper around some source somewhere with maybe the authorization built in so you don't have to deal with that.Templaters
andPublishers
just become executable functions which you can invoke using theinvoke
key, referencing something that has been defined with the scaffolder at startup.