cnoe-io / idpbuilder

Spin up a complete internal developer platform with only Docker required as a dependency.
https://cloud-native.slack.com/archives/C05TN9WFN5S
Apache License 2.0
168 stars 51 forks source link

[Bug]: Problems running examples/ref-implementation. How is backstage image determined? Plugins managed? #188

Closed Analect closed 4 months ago

Analect commented 5 months ago

What is your environment, configuration, and command?

Having watched the latest community meeting video, I have been experimenting with using devpod to get this ref-implementation running. I haven't gone as far as the experimentation by @aatchison at aatchison/cnoe-devcontainer-feature-demo. I merely ran up a remote environment on a linux ubuntu 22.04 box with 8GB of memory available. From a terminal in that remote devpod container, I ran idpbuilder create --use-path-routing --package-dir examples/ref-implementation --kind-config ./examples/local-backup/kind.yaml, adjusting up the memory to 6Gi in the kind.yaml, since I thought the problems I was initially facing related to 4Gi not being sufficient.

idpbuilder create --use-path-routing --package-dir examples/ref-implementation --kind-config ./examples/local-backup/kind.yaml
time=2024-04-09T10:48:14.956Z level=INFO msg="Creating kind cluster" logger=setup
########################### Our kind config ############################
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: "kindest/node:v1.27.3"
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=6Gi
        node-labels: "ingress-ready=true"
  extraMounts:
    - hostPath: /home/ubuntu/backup # replace with your own path
      containerPath: /backup
  extraPortMappings:
  - containerPort: 443
    hostPort: 8443
    protocol: TCP

What did you do and What did you see instead?

That didn't resolve matters, so I decided to change the backstage image reference in example/ref-implementation/backstage/manifests/install.yaml to ghcr.io/cnoe-io/backstage-app:plugin-scaffolder-actions from public.ecr.aws/cnoe-io/backstage:rc1, which seems to be a more recently-generated version that I thought might work. I can see also from https://gallery.ecr.aws/cnoe-io/backstage that public.ecr.aws/cnoe-io/backstage:rc2 exists from about a month ago. I haven't tried this one, but it seems to me the problems I'm facing are perhaps not solved by this either. Could anyone suggest what might be going wrong here?

image

On a broader related matter, I haven't been able to find documentation around how either CNOE or idpbuilder is intended to help with modifying how backstage gets set up. Let's say your desire is to have a list of plugins that work in conjunction with other tooling spawned by idpbuilder, let's say some of the plugins from https://github.com/janus-idp/backstage-showcase/blob/main/README.md ... is there a pattern to how these get specified?

For plugins specified at https://github.com/cnoe-io/backstage-app/tree/main/plugins ... can these include those such as the janus-idp ones? Do they require any special set-up or treatment for working in a CNOE context? Does CNOE/idpbuilder work with the new dynamic backend plugin system?

Thanks for your guidance.

Additional Information. Logs.

These are the logs from the failing backstage pod:

 Loaded config from app-config.yaml
{"level":"info","message":"Found 9 new secrets in config that will be redacted","service":"backstage"}
{"level":"info","message":"Created UrlReader predicateMux{readers=azure{host=dev.azure.com,authed=false},bitbucketCloud{host=bitbucket.org,authed=false},github{host=github.com,authed=false},gitea{host=cnoe.localtest.me:8443,authed=true},gitea{host=cnoe.localtest.me,authed=true},gitlab{host=gitlab.com,authed=false},awsS3{host=amazonaws.com,authed=false},fetch{}","service":"backstage"}
{"level":"info","message":"Performing database migration","plugin":"catalog","service":"backstage","type":"plugin"}
{"level":"debug","message":"Processing location:default/generated-77dd5624d38456d8313958c3dc3993820d8aecce","plugin":"catalog","service":"backstage","type":"plugin"}
(node:1) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node --trace-warnings ...` to show where the warning was created)
{"level":"info","message":"Configuring \"database\" as KeyStore provider","plugin":"auth","service":"backstage","type":"plugin"}
{"level":"info","message":"Configuring auth provider: keycloak-oidc","plugin":"auth","service":"backstage","type":"plugin"}
{"level":"info","message":"Creating Local publisher for TechDocs","plugin":"techdocs","service":"backstage","type":"plugin"}
{"level":"info","message":"Added DefaultCatalogCollatorFactory collator factory for type software-catalog","plugin":"search","service":"backstage","type":"plugin"}
{"level":"info","message":"Added DefaultTechDocsCollatorFactory collator factory for type techdocs","plugin":"search","service":"backstage","type":"plugin"}
{"level":"info","message":"Starting all scheduled search tasks.","plugin":"search","service":"backstage","type":"plugin"}
{"level":"info","message":"Initializing Kubernetes backend","plugin":"kubernetes","service":"backstage","type":"plugin"}
{"level":"info","message":"action=LoadingCustomResources numOfCustomResources=0","plugin":"kubernetes","service":"backstage","type":"plugin"}
{"level":"info","message":"Serving static app content from /app/packages/app/dist","plugin":"app","service":"backstage","type":"plugin"}
/app/node_modules/openid-client/lib/helpers/request.js:140
      throw new RPError(`outgoing request timed out after ${opts.timeout}ms`);
            ^

RPError: outgoing request timed out after 3500ms
    at /app/node_modules/openid-client/lib/helpers/request.js:140:13
    at async Issuer.discover (/app/node_modules/openid-client/lib/issuer.js:143:22)
    at async OidcAuthProvider.setupStrategy (/app/node_modules/@backstage/plugin-auth-backend/dist/index.cjs.js:1581:20)

Node.js v18.19.1 

These are the logs from the failing argo-server pod:

time="2024-04-09T09:30:12.971Z" level=info msg="not enabling pprof debug endpoints"                                                                                                   │
│ time="2024-04-09T09:30:12.974Z" level=info authModes="[client sso]" baseHRef=/argo-workflows/ managedNamespace= namespace=argo secure=false ssoNamespace=argo                         │
│ time="2024-04-09T09:30:12.975Z" level=warning msg="You are running in insecure mode. Learn how to enable transport layer security: https://argo-workflows.readthedocs.io/en/release-3 │
│ time="2024-04-09T09:35:12.973Z" level=info msg="Alloc=5806 TotalAlloc=9418 Sys=18789 NumGC=5 Goroutines=6"                                                                            │
│ time="2024-04-09T09:40:12.975Z" level=info msg="Alloc=5806 TotalAlloc=9420 Sys=18789 NumGC=7 Goroutines=6"                                                                            │
│ time="2024-04-09T09:45:13.014Z" level=info msg="Alloc=5806 TotalAlloc=9423 Sys=18789 NumGC=9 Goroutines=6"
nimakaviani commented 5 months ago

Hi @Analect and thanks for trying out idpBuilder.

A few thoughts from glancing over your setup.

the reference implementation is pretty heavy in terms of the number of tools it deploys and generally 8GB is not enough memory to get all the tools up and running. The minimum recommended memory is around 12GB.

The K9s screenshot you shared lacks data on some key components. particularly, is keycloak up and running or is it failing? The timeout from the backstage pod appears to indicate that the openid client is timing out, which makes me wonder whether keycloak is up and running and responding to requests.

your Argo CD container doesnt have much in the logs and the delay in it starting up could be related to memory limitations on the host machine.

Certainly happy to help you debug if and when you provide more details on the setup.

csantanapr commented 5 months ago

@Analect Backstage and ArgoWorkflows will failed until keycloak is up and stage. Those error you see if that backstage and argoworkflows can't reach keycloack. Usually takes from 8 to 10 mins for everything to be ready and all pods Running

nabuskey commented 4 months ago

Please feel free to re-open this issue if you are still running into the same problem.