backstage / backstage

Backstage is an open framework for building developer portals
https://backstage.io/
Apache License 2.0
28.12k stars 5.96k forks source link

[TechDocs] Event Based Generation #14380

Closed awanlin closed 1 year ago

awanlin commented 2 years ago

Feature Suggestion

Base on the amazing work being done in #13931 we should consider building a subscriber that would generate TechDocs based on the event. This would be the sweet spot between on-demand and having a CI/CD pipeline!

Possible Implementation

Building off of #13931 we could implement a subscriber that could listen for events that show a repo has changed and then generate the TechDocs for it. To map an event to its entity we could possibly introduce a new annotation that is the URL to the repo or we might be able to leverage the existing source location. Once the mapping has been determined then it's just a matter of pulling together existing code to generate the TechDocs.

It would be nice if this subscriber could get from the event what changed so that if nothing in the docs folder changed then the subscriber would just end as there would be no need to generate new TechDocs

Implementation

  1. Implement EventSubscriber - https://github.com/backstage/backstage/blob/master/plugins/events-node/src/api/EventSubscriber.ts
  2. In onEvent capture repo URL (GitHub, Azure DevOps) or key repo details (Bitbucket Server)
  3. Query catalog for all entities with:
  4. Trigger the TechDocs generation with all matching entities:
    • Use the /sync/:namespace/:kind/:name route?
    • Or use the DocsBuilder?

Events

Here's details on the events that would be applicable for Azure DevOps, GitHub, and Bitbucket Server:

Context

We currently have a few hundred repos in each of the following: Azure DevOps Server 2020, Bitbucket Server, and GitHub. Trying to manage the TechDocs pipelines for this has been challenging. We've been able to centralize Azure DevOps Server 2020 and Bitbucket Server but GitHub has been more challenging. These also run on a schedule so there are times where the TechDocs are not as up to date as they could be.

pjungermann commented 1 year ago

It would be nice if this subscriber could get from the event what changed so that if nothing in the docs folder changed then the subscriber would just end as there would be no need to generate new TechDocs

Unfortunately, this will not be possible for all. E.g., events by Bitbucket Cloud don't contain enough information. This might be the case for some others, too. However, it should be good enough to just trigger the pipeline on any change (additional filtering of the event might consume too much from the rate limits).

awanlin commented 1 year ago

Now that https://github.com/backstage/backstage/pull/13931 has been merged in I've discussed this item with my work and got approval to move forward. I've updated the description with a rough implementation and some links to the related events. Now I have some questions:

Finally, can someone assign this issue to me, please?

pjungermann commented 1 year ago

@awanlin I think it depends a bit on the implementation. E.g., maybe it can be abstracted in parts. However, SCM provider-specific implementation would be best to have in a separate plugin/package, yes.


(to help me a bit) TechDocs Build Process DefaultDocsBuildStrategy will build only if techdocs.builder was set to local in your config.

DocsBuilder receives DocsBuilderArguments which contains

  entity: Entity;
  scmIntegrations: ScmIntegrationRegistry;

among others.

DocsSynchronizer.doSync seems to be responsible for triggering the DocsBuilder for a given entity with a pLimit of 10.

There is a CachedEntityLoader which is something to consider (e.g., maybe you get stale data at some point; however, likely not super relevant for you).

DirectoryPreparer will read the tree via the SCM provider. TechdocsGenerator works on a local dir (local, docker, ...).


The SCM-related EntityProviders use Location entities for the ingestion. Each of these can contain 1+ entities. 0+ of these may have the required techdocs annotation.

Some webhook events provide details about affected files (e.g., was there a change to docs?). E.g., at GitHub events you could understand whether docs need a change or not. However, not sure if this conflicts with the current way to detect whether an update is needed causing a re-generation.

Techdocs could react to an event techdocs.entityChanged. However, calling /sync/:namespace/:kind/:name (executing DocsSynchronizer.doSync) seems reasonable, too.

To find affected entities, you could use the catalog API and filter for the conditions you have mentioned. However, you also need to consider the event-based ingestion at EntityProviders. If you react to the same event like github.push, they might not be ready yet and the catalog API might not return the expected set of entities yet.

This could be relevant for newly added or removed entities; in case of changes to the techdocs annotation also for updated ones.

The EntityProviders themselves will only ask for a refresh or apply the delta modification. Even if they complete, the entities still need to be processed.

Having an event for new entities could help mitigate parts of that. Removed entities might not be super relevant here. Annotation changes seem to be the most tricky to handle though.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

awanlin commented 1 year ago

We have a working version of this internally, just working out a few bugs and doing some refactoring before we contribute it.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vanyakosmos commented 1 year ago

@awanlin hello, are you finished with few bugs ( ͡° ͜ʖ ͡°)? I'm planning to build something similar, but for gitlab and also to update entities via webhooks. If you have something to share it will be great help.

awanlin commented 1 year ago

Hi @vanyakosmos, yeah, it's been ready for a while now I just had a few larger PRs that I had contributed I felt needed to be completed first. I'll try and put a PR together on Friday as that's when I have dedicated time for contributions like this.

For entities have you seen some of the work that's already in place for GitHub? Those might be good to look over: https://github.com/backstage/backstage/blob/2f91830761daf41af456a3f48052c755edffe68b/plugins/catalog-backend-module-github/src/providers/GithubEntityProvider.ts#L280 and https://github.com/backstage/backstage/blob/2f91830761daf41af456a3f48052c755edffe68b/plugins/catalog-backend-module-github/src/providers/GithubOrgEntityProvider.ts#L255

awanlin commented 1 year ago

I have a Draft PR created for this here: https://github.com/backstage/backstage/pull/18411. I did a bit of refactoring from our internal version and want to do a bit more testing before being ready for review

@vanyakosmos, sorry this took over a month to get to, feel free to look at the PR

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.