backstage / backstage

Backstage is an open framework for building developer portals
https://backstage.io/
Apache License 2.0
27.03k stars 5.65k forks source link

πŸ› Bug Report: Techdocs returning 401 after enabling permissions flag #24791

Open darrenyung opened 1 month ago

darrenyung commented 1 month ago

πŸ“œ Description

When permissions flag is enabled via config, the techdocs pages returns 401 response when trying to retrieve docs from GCS.

πŸ‘ Expected behavior

Techdocs renders just fine.

πŸ‘Ž Actual Behavior with Screenshots

One of many web responses from the network tab

{
    "error": {
        "name": "AuthenticationError",
        "message": "Unable to call 'catalog' plugin on behalf of user, because the target plugin does not support on-behalf-of tokens or the plugin doesn't exist"
    },
    "request": {
        "method": "GET",
        "url": "/api/techdocs/static/docs/default/component/backstage/assets/img_1.png"
    },
    "response": {
        "statusCode": 401
    }
}

πŸ‘Ÿ Reproduction steps

  1. Enable permissions on step one from this guide
  2. Follow library implementation of the new backend system via this guide
  3. Have your post mkdocs docs build hosted in a GCS bucket.
  4. Configure your techdocs to view from the bucket.
  5. Run on any NODE_ENV other than your local. (Either as staging or production)

πŸ“ƒ Provide the context for the Bug.

  1. Open network tab and go to the documentation of a catalog related to the docs.
  2. Observe page not loading anything but in the network tab, observe a series of 401 responses from Techdocs API

πŸ–₯️ Your Environment

OS: Darwin 23.4.0 - darwin/x64 node: v20.11.0 yarn: 1.22.19 cli: 0.26.4 (installed) backstage: 1.26.0

Dependencies: @backstage/app-defaults 1.5.4 @backstage/backend-app-api 0.5.14, 0.7.2 @backstage/backend-common 0.19.10, 0.21.7 @backstage/backend-defaults 0.2.17 @backstage/backend-dev-utils 0.1.4 @backstage/backend-openapi-utils 0.1.10 @backstage/backend-plugin-api 0.6.17 @backstage/backend-tasks 0.5.22 @backstage/backend-test-utils 0.3.7 @backstage/catalog-client 1.6.4 @backstage/catalog-model 1.4.5 @backstage/cli-common 0.1.13 @backstage/cli-node 0.2.5 @backstage/cli 0.25.2, 0.26.4 @backstage/config-loader 1.8.0 @backstage/config 1.2.0 @backstage/core-app-api 1.12.4 @backstage/core-compat-api 0.2.4 @backstage/core-components 0.11.2, 0.13.10, 0.14.6 @backstage/core-plugin-api 1.9.2 @backstage/dev-utils 1.0.31 @backstage/e2e-test-utils 0.1.1 @backstage/errors 1.2.4 @backstage/eslint-plugin 0.1.7 @backstage/frontend-plugin-api 0.6.4 @backstage/integration-aws-node 0.1.12 @backstage/integration-react 1.1.26 @backstage/integration 1.10.0 @backstage/plugin-api-docs 0.11.4 @backstage/plugin-app-backend 0.3.65 @backstage/plugin-app-node 0.1.17 @backstage/plugin-auth-backend-module-atlassian-provider 0.1.9 @backstage/plugin-auth-backend-module-aws-alb-provider 0.1.9 @backstage/plugin-auth-backend-module-azure-easyauth-provider 0.1.0 @backstage/plugin-auth-backend-module-bitbucket-provider 0.1.0 @backstage/plugin-auth-backend-module-cloudflare-access-provider 0.1.0 @backstage/plugin-auth-backend-module-gcp-iap-provider 0.2.12 @backstage/plugin-auth-backend-module-github-provider 0.1.14 @backstage/plugin-auth-backend-module-gitlab-provider 0.1.14 @backstage/plugin-auth-backend-module-google-provider 0.1.14 @backstage/plugin-auth-backend-module-guest-provider 0.1.3 @backstage/plugin-auth-backend-module-microsoft-provider 0.1.12 @backstage/plugin-auth-backend-module-oauth2-provider 0.1.14 @backstage/plugin-auth-backend-module-oauth2-proxy-provider 0.1.10 @backstage/plugin-auth-backend-module-oidc-provider 0.1.8 @backstage/plugin-auth-backend-module-okta-provider 0.0.10 @backstage/plugin-auth-backend 0.22.4 @backstage/plugin-auth-node 0.2.19, 0.4.12 @backstage/plugin-auth-react 0.1.1 @backstage/plugin-badges-backend 0.4.1 @backstage/plugin-badges 0.2.59 @backstage/plugin-catalog-backend-module-github-org 0.1.12 @backstage/plugin-catalog-backend-module-github 0.6.0 @backstage/plugin-catalog-backend-module-scaffolder-entity-model 0.1.15 @backstage/plugin-catalog-backend-module-unprocessed 0.4.4 @backstage/plugin-catalog-backend 1.21.1 @backstage/plugin-catalog-common 1.0.22 @backstage/plugin-catalog-graph 0.4.4 @backstage/plugin-catalog-import 0.10.10 @backstage/plugin-catalog-node 1.11.1 @backstage/plugin-catalog-react 1.11.3 @backstage/plugin-catalog-unprocessed-entities-common 0.0.1 @backstage/plugin-catalog-unprocessed-entities 0.2.3 @backstage/plugin-catalog 1.19.0 @backstage/plugin-cost-insights-common 0.1.3 @backstage/plugin-cost-insights 0.12.24 @backstage/plugin-devtools-backend 0.3.3 @backstage/plugin-devtools-common 0.1.9 @backstage/plugin-devtools 0.1.13 @backstage/plugin-entity-validation 0.1.20 @backstage/plugin-events-node 0.3.3 @backstage/plugin-explore-common 0.0.2, 0.0.3 @backstage/plugin-explore-react 0.0.38, 0.0.39 @backstage/plugin-explore 0.4.21 @backstage/plugin-github-actions 0.6.16 @backstage/plugin-github-issues 0.4.2 @backstage/plugin-github-pull-requests-board 0.2.1 @backstage/plugin-home-react 0.1.12 @backstage/plugin-home 0.7.3 @backstage/plugin-kubernetes-backend 0.17.0 @backstage/plugin-kubernetes-common 0.7.5 @backstage/plugin-kubernetes-node 0.1.11 @backstage/plugin-kubernetes-react 0.3.4 @backstage/plugin-kubernetes 0.11.9 @backstage/plugin-org 0.6.24 @backstage/plugin-permission-backend 0.5.41 @backstage/plugin-permission-common 0.7.13 @backstage/plugin-permission-node 0.7.28 @backstage/plugin-permission-react 0.4.22 @backstage/plugin-playlist-backend 0.3.22 @backstage/plugin-playlist-common 0.1.16 @backstage/plugin-playlist 0.2.9 @backstage/plugin-proxy-backend 0.4.15 @backstage/plugin-scaffolder-backend-module-azure 0.1.9 @backstage/plugin-scaffolder-backend-module-bitbucket-cloud 0.1.7 @backstage/plugin-scaffolder-backend-module-bitbucket-server 0.1.7 @backstage/plugin-scaffolder-backend-module-bitbucket 0.2.7 @backstage/plugin-scaffolder-backend-module-gerrit 0.1.9 @backstage/plugin-scaffolder-backend-module-gitea 0.1.7 @backstage/plugin-scaffolder-backend-module-github 0.2.7 @backstage/plugin-scaffolder-backend-module-gitlab 0.3.3 @backstage/plugin-scaffolder-backend 1.22.5 @backstage/plugin-scaffolder-common 1.5.1 @backstage/plugin-scaffolder-node 0.4.3 @backstage/plugin-scaffolder-react 1.8.4 @backstage/plugin-scaffolder 1.19.3 @backstage/plugin-search-backend-module-catalog 0.1.23 @backstage/plugin-search-backend-module-elasticsearch 1.4.0 @backstage/plugin-search-backend-module-pg 0.5.26 @backstage/plugin-search-backend-module-techdocs 0.1.22 @backstage/plugin-search-backend-node 1.2.21 @backstage/plugin-search-backend 1.5.7 @backstage/plugin-search-common 1.2.11 @backstage/plugin-search-react 1.7.10 @backstage/plugin-search 1.4.10 @backstage/plugin-shortcuts 0.3.24 @backstage/plugin-tech-insights-backend-module-jsonfc 0.1.50 @backstage/plugin-tech-insights-backend 0.5.32 @backstage/plugin-tech-insights-common 0.2.13 @backstage/plugin-tech-insights-node 0.6.1 @backstage/plugin-tech-insights 0.3.27 @backstage/plugin-tech-radar 0.7.4 @backstage/plugin-techdocs-addons-test-utils 1.0.31 @backstage/plugin-techdocs-backend 1.10.4 @backstage/plugin-techdocs-module-addons-contrib 1.1.9 @backstage/plugin-techdocs-node 1.12.3 @backstage/plugin-techdocs-react 1.2.3 @backstage/plugin-techdocs 1.10.4 @backstage/plugin-todo-backend 0.3.17 @backstage/plugin-todo 0.2.39 @backstage/plugin-user-settings 0.8.5 @backstage/release-manifests 0.0.11 @backstage/repo-tools 0.8.0 @backstage/test-utils 1.5.4 @backstage/theme 0.2.19, 0.5.3 @backstage/types 1.1.1 @backstage/version-bridge 1.0.8 ✨ Done in 4.37s.

πŸ‘€ Have you spent some time to check if this bug has been raised before?

🏒 Have you read the Code of Conduct?

Are you willing to submit PR?

None

darrenyung commented 1 month ago

My backend index.ts has the following

// App
backend.add(import('@backstage/plugin-app-backend/alpha'));

// Permissions
backend.add(import('@backstage/plugin-permission-backend/alpha'));
backend.add(customPermissionBackendModule);

// Catalog Entities and GitHub
backend.add(import('@backstage/plugin-catalog-backend/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github-org'));
backend.add(
  import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);
backend.add(import('@backstage/plugin-catalog-backend-module-unprocessed'));

// Techdocs
backend.add(import('@backstage/plugin-techdocs-backend/alpha'));

// Auth
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-guest-provider'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(authCustomModuleMicrosoftProvider);

// Scaffolder
backend.add(import('@backstage/plugin-scaffolder-backend/alpha'));
backend.add(import('@backstage/plugin-scaffolder-backend-module-github'));
backend.add(scaffolderModuleCustomExtensions);

Furthermore, techdocs renders fine when logged in as guest but I had to add the following flag

 guest:
   - dangerouslyAllowOutsideDevelopment: true

However, in my higher envs, Guests login are disabled.

Rugvip commented 1 month ago

What's going on here is that TechDocs is using the new cookie auth and plugin service auth. When using those two are used in combination it's important that all parts of the system is using the new plugin service auth, or this might happen. When the user makes a request to the TechDocs static assets they'll use a cookie with a limited user token. This limited user token then in turn can only be converted into a valid service token to make a request to the catalog backend if both the TechDocs and Catalog plugins use the new plugin service auth. The logic for all of that is here. You're running everything in one backend though, so I don't expect that things being out of sync to be the actual issue here.

Couple of things to check: do you have any backend.discovery configuration in app-config.yaml? That could mess with the ability for TechDocs to reach the catalog. And in general, do you have any service mesh/proxies that might sit in-between plugins?

Second thing, how's the authCustomModuleMicrosoftProvider implemented? Is it a plain sign-in resolver configuration or something more elaborate?

darrenyung commented 1 month ago

What's going on here is that TechDocs is using the new cookie auth and plugin service auth. When using those two are used in combination it's important that all parts of the system is using the new plugin service auth, or this might happen. When the user makes a request to the TechDocs static assets they'll use a cookie with a limited user token. This limited user token then in turn can only be converted into a valid service token to make a request to the catalog backend if both the TechDocs and Catalog plugins use the new plugin service auth. The logic for all of that is here. You're running everything in one backend though, so I don't expect that things being out of sync to be the actual issue here.

Couple of things to check: do you have any backend.discovery configuration in app-config.yaml? That could mess with the ability for TechDocs to reach the catalog. And in general, do you have any service mesh/proxies that might sit in-between plugins?

Second thing, how's the authCustomModuleMicrosoftProvider implemented? Is it a plain sign-in resolver configuration or something more elaborate?

Hi @Rugvip , thanks for the reply. I'm trying to ascertain the situation.

authCustomModuleMicrosoftProvider is a simple resolver with additional callout to custom function for profile consolidation from external data source. Towards the end, it will call if external profiles are obtained successfully

ctx.signInWithCatalogUser({ kind: 'User', name: userId, namespace: DEFAULT_NAMESPACE });

else if no other profiles are found, it will resolve with

ctx.issueToken({ claims: {  sub: userEntityRef. ent: [userEntityRef] } })

I'm trying to replicate the issue locally here but seems to be working as expected while the promoted envs are not.

darrenyung commented 1 month ago

I also noticed that this is creeping heavily in the logs from our higher environments.

[2m2024-05-28T06:29:43.537Z[22m [34mbackstage[39m [31merror[39m Unexpected failure for target JWKS check fetch failed [36mcause[39m=Error: Request was cancelled. [36mstack[39m=TypeError: fetch failed at node:internal/deps/undici/undici:12345:11 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async doCheck ([/app/node_modules/@backstage/backend-app-api/dist/index.cjs.js:2362:21](mailto:/app/node_modules/@backstage/backend-app-api/dist/index.cjs.js:2362:21)) at async DefaultAuthService.getPluginRequestToken ([/app/node_modules/@backstage/backend-app-api/dist/index.cjs.js:2141:35](mailto:/app/node_modules/@backstage/backend-app-api/dist/index.cjs.js:2141:35)) at async [/app/node_modules/@backstage/plugin-techdocs-backend/dist/index.cjs.js:722:27](mailto:/app/node_modules/@backstage/plugin-techdocs-backend/dist/index.cjs.js:722:27)

Could this todo with the undici proxy? We do have that since inception, furthermore with NO_PROXY set with localhost as recommended in this issue.

I noticed there's a patch been made but not due for release until the next version.

Also, here's my full file content for index.ts in our backend

import { bootstrap as globalAgentBootstrap } from 'global-agent';
import { ProxyAgent, setGlobalDispatcher } from 'undici';
import { createBackend } from '@backstage/backend-defaults';
import { legacyPlugin } from '@backstage/backend-common';
import {
  authCustomModuleMicrosoftProvider,
} from './plugins/auth';
import { scaffolderModuleCustomExtensions } from './plugins/scaffolder';
import { customPermissionBackendModule } from './plugins/permissions';
import { healthCheckPlugin } from './plugins/healthcheck';

if (process.env.HTTP_PROXY || process.env.HTTPS_PROXY) {
  globalAgentBootstrap();
  setGlobalDispatcher(new ProxyAgent(process.env.HTTP_PROXY as string));
}

const backend = createBackend();

// LEGACY MODULES (Not yet migrated to the new framework)
// TODO: Keep watch on latest releases and custom plugin should adopt to the new structure
backend.add(legacyPlugin('search', import('./plugins/search')));
backend.add(
  legacyPlugin('tech-insights', import('./plugins/techinsights/techInsights')),
);

// Custom Plugins that will need to be migrated
backend.add(legacyPlugin('cost-insight', import('./plugins/cost-insight')));

// Modules running under new Backend Infra
// App
backend.add(import('@backstage/plugin-app-backend/alpha'));

// Permissions
backend.add(import('@backstage/plugin-permission-backend/alpha'));
backend.add(customPermissionBackendModule );

// Catalog Entities and GitHub
backend.add(import('@backstage/plugin-catalog-backend/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github-org'));
backend.add(
  import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);
backend.add(import('@backstage/plugin-catalog-backend-module-unprocessed'));

// Techdocs
backend.add(import('@backstage/plugin-techdocs-backend/alpha'));

// Auth
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-guest-provider'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(authCustomModuleMicrosoftProvider);

// Scaffolder
backend.add(import('@backstage/plugin-scaffolder-backend/alpha'));
backend.add(import('@backstage/plugin-scaffolder-backend-module-github'));
backend.add(scaffolderModuleCustomExtensions);

// Other Plugins
backend.add(import('@backstage/plugin-devtools-backend'));
backend.add(import('@backstage/plugin-playlist-backend'));
backend.add(import('@backstage/plugin-badges-backend'));
backend.add(import('@backstage/plugin-todo-backend'));
backend.add(import('@backstage/plugin-proxy-backend/alpha'));
backend.add(import('@backstage/plugin-kubernetes-backend/alpha'));

backend.add(healthCheckPlugin);

backend.start();
benjdlambert commented 1 month ago

@darrenyung it's possible that the undici ProxyAgent you're using does not respect NO_PROXY env variables.

There's a workaround here: https://github.com/nodejs/undici/issues/1650#issuecomment-1341384515 but we were discussing potentially bringing some of this logic closer to the framework over in https://github.com/backstage/backstage/issues/24841 too.

EDIT: It also looks like undici just shipped native support for this, in the way of EnvHttpProxyAgent which might be the route we want to go:

https://github.com/nodejs/undici/pull/2994

darrenyung commented 1 month ago

Hi @benjdlambert , thanks. The error seems to no longer be existent in our logs and also the perms are working again. This workaround unblocks our work for now.

darrenyung commented 1 month ago

Hi, reopening this issue as the workaround seems to have broken scaffolder github actions now. 😭

The template produces the following output

12024-06-04T05:17:17.964Z Beginning step Fetch Base
22024-06-04T05:17:17.988Z info: Fetching template content from remote URL
32024-06-04T05:17:18.010Z HttpError: fetch failed
4    at /app/node_modules/@octokit/request/dist-node/index.js:146:11
5    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
6    at async hook (/app/node_modules/@octokit/auth-app/dist-node/index.js:352:18)
7    at async Object.next, Opens in a new window (/app/node_modules/@octokit/rest/node_modules/@octokit/plugin-paginate-rest/dist-node/index.js:76:28)
8    at async GithubAppManager.getInstallationData (/app/node_modules/@backstage/integration/dist/index.cjs.js:1783:30)
9    at async /app/node_modules/@backstage/integration/dist/index.cjs.js:1750:45
10    at async Cache.getOrCreateToken (/app/node_modules/@backstage/integration/dist/index.cjs.js:1696:34)
11    at async Promise.all (index 0)
12    at async GithubAppCredentialsMux.getAppToken (/app/node_modules/@backstage/integration/dist/index.cjs.js:1822:21)
13    at async _SingleInstanceGithubCredentialsProvider.getCredentials (/app/node_modules/@backstage/integration/dist/index.cjs.js:1877:17)
14    at async _GithubUrlReader.getRepoDetails (/app/node_modules/@backstage/backend-common/dist/index.cjs.js:2799:25)
15    at async _GithubUrlReader.readTree (/app/node_modules/@backstage/backend-common/dist/index.cjs.js:2705:25)
16    at async UrlReaderPredicateMux.readTree (/app/node_modules/@backstage/backend-common/dist/index.cjs.js:3650:16)
17    at async Object.fetchContents (/app/node_modules/@backstage/plugin-scaffolder-node/dist/index.cjs.js:81:17)
18    at async Object.handler (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-CDFi_apW.cjs.js:1080:7)
19    at async NunjucksWorkflowRunner.executeStep (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-CDFi_apW.cjs.js:2681:9)
benjdlambert commented 3 weeks ago

@darrenyung which workaround did you use? Are you using public github or on prem?

darrenyung commented 3 weeks ago

Hi @benjdlambert , I've used this workaround, it worked for this case but not the other.

I'm using public github but I'm behind corporate proxy.

benjdlambert commented 3 weeks ago

Hmm it's unfortunate that we can't see what the actual issue is with the fetch failed. But I would assume that it's not sending requests through the proxy. I wonder if you were to install the latest version of undici locally and use the EnvHttpProxyAgent does that help?

I wonder if your version of octokit is using the global fetch or it's own polyfill that's not respecting the proxy settings.

You should be able to reproduce the issue pretty easily and see what works, by just creating a test.ts file somewhere in your backstage app, that creates an octokit client and try to list some repos or whatever behind the corporate proxy.

Playing around with the workarounds in that file to see which one allows you to talk through to public github.

It's also possible that if Octokit is not using undici or the native fetch, then you might need to the global-agent workaround as well right?

darrenyung commented 3 weeks ago

Thanks. I'll take a deeper look tomorrow. Strange enough that the github discovery function is working as intended which I believe also uses the octokit library but not the git related scaffolder action.

benjdlambert commented 3 weeks ago

Hmm it's possible that they could be using different versions of octokit in the yarn.lock though. They're different modules and could perhaps be different ranges. You can have a check with yarn why octokit and see if it's all the same version.

darrenyung commented 3 weeks ago

Plot is thickening. After rebuilding the packages and regenerate my lock file. I'm now facing the same issues as this 😿

benjdlambert commented 3 weeks ago

Hmm this looks like there's an incompatibility of some packages frontend packages requiring backend code, or vice versa. I wonder if you can run the scripts/verify-local-dependencies.js in your repo and it might help discover these issues. Not sure how best to troubleshoot this further to be honest, it could be some package that you've installed in the new upgrade too that does this, not sure if that script will pick those issues up though.

darrenyung commented 3 weeks ago

It is a very interesting development. Pretty sure I did the same steps as before during the upgrade process, following upgrade steps and regenerating lock file like a ritual.

Everything checked out during our regressions tests after major modifications. Its just in recent times, there's a lot of side issues popping up as we move to hardened up our instance and releasing new features including this OP. Looks like I got a lot of work cut out in the next few days to try and isolate one issue at a time. This issue also came up after running yarn

benjdlambert commented 3 weeks ago

Yeah I wonder if it's worth back tracking on regenerating the whole of the lock file for now, and try to solve one problem at a time. You can just remove octokit references from the yarn.lock and do a yarn install to rebuild those deps only.