chainloop-dev / chainloop

Chainloop is an Open Source evidence store for your Software Supply Chain attestations, SBOMs, VEX, SARIF, CSAF files, QA reports, and more.
https://docs.chainloop.dev
Apache License 2.0
368 stars 27 forks source link

Error during helm installation with current chart version : 1.101.0 #1353

Closed hanygirgis closed 1 week ago

hanygirgis commented 1 week ago

Hello, I'm trying to install chainloop but I'm getting this error :

Error: INSTALLATION FAILED: 1 error occurred:
    * Deployment.apps "chainloop-cas" is invalid: [spec.template.spec.volumes[2].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[2].name: Not found: "server-certs"]

A values file was passed, which contains a few customCAs :

controlplane:
  customCAs:
    - |-
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----
cas:
  customCAs:
    - |-
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----

This is the full command-line used :

$ helm install chainloop oci://ghcr.io/chainloop-dev/charts/chainloop --values ./values-noingress.yaml \
>     --set global.openshift=true \
>     --set development=true \
>     --set controlplane.auth.oidc.url=https://hostname/realms/Chainloop \
>     --set controlplane.auth.oidc.clientID=chainloop \
>     --set controlplane.auth.oidc.clientSecret=REDACTED \
>     --set controlplane.auth.oidc.externalURL=https://hostname \
>     --set controlplane.image.repository=repo/chainloop/control-plane \
>     --set controlplane.image.tag=v0.96.13 \
>     --set controlplane.migration.image.repository=repo/chainloop/control-plane-migrations \
>     --set controlplane.migration.image.tag=v0.96.13 \
>     --set cas.image.repository=repo/chainloop/artifact-cas \
>     --set cas.image.tag=v0.96.13 \
>     --set controlplane.tlsConfig.secret.name=grpc-tls-secret \
>     --set cas.tlsConfig.secret.name=grpc-tls-secret
Pulled: ghcr.io/chainloop-dev/charts/chainloop:1.101.0
Digest: sha256:0d3e7eabbd79dddb3712943c2ec0209f97594429d0c114409fb750451747ed9f
Error: INSTALLATION FAILED: 1 error occurred:
    * Deployment.apps "chainloop-cas" is invalid: [spec.template.spec.volumes[2].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[2].name: Not found: "server-certs"]
migmartri commented 1 week ago

Thanks @hanygirgis for reporting the issue. It's likely we added a regression/broke compatibility when we moved our chart to use Bitnami practices.

From a quick scan it's likely it's related to mounting the CA certs into the pods. We'll take a look, sorry about that

javirln commented 1 week ago

Hello @hanygirgis! Thanks for reporting the issue. It was indeed a bug on the chart related to the adaptation of the Bitnami Chart convention. The patch is already on review here: https://github.com/chainloop-dev/chainloop/pull/1354

However I see you are using cas.tlsConfig.secret.name and controlplane.tlsConfig.secret.name to setup the secrets for the GRPc services. Please note that configuration is deprecated. Although it will work, the recommended way would be the following: controlplane.tls.existingSecret and cas.tls.existingSecret, in your case:

>     --set controlplane.tls.existingSecret=grpc-tls-secret \
>     --set cas.tls.existingSecret=grpc-tls-secret

The secret must contains 2 keys: tls.crt and tls.key respectively containing the certificate and private key.

javirln commented 1 week ago

The new version of the chart is pushed (ghcr.io/chainloop-dev/charts/chainloop:1.101.1), would you mind give it a try and confirm that everything works as expected please?

hanygirgis commented 1 week ago

@javirln Thanks a lot, I just gave it a shot, and it doesn't give that error anymore. However, I noticed that the 3 Chainloop images weren't deployed, apparently because the overidden image repository on the command-line; I see this at the end of the helm install output :

Substituted images detected:
  - [ghcr.io/repo/chainloop/control-plane:v0.96.13](http://ghcr.io/repo/chainloop/control-plane:v0.96.13)
  - [ghcr.io/repo/chainloop/artifact-cas:v0.96.13](http://ghcr.io/repo/chainloop/artifact-cas:v0.96.13)
  - [ghcr.io/repo/chainloop/control-plane-migrations:v0.96.13](http://ghcr.io/repo/chainloop/control-plane-migrations:v0.96.13)

I see that "ghcr.io/" has been added at the beginning (see my cmd above) - maybe that's the reason why the images aren't installed.

javirln commented 1 week ago

Could you give it a try with the following configuration:

>     --set controlplane.image.registry=repo \
>     --set controlplane.image.repository=chainloop/control-plane \
>     --set controlplane.image.tag=v0.96.13 \

Where controlplane.image.registry is your registry. What's happening is that the value being taken is the default configuration found on the values.yaml, in the case of registry, ghcr.io:

controlplane|cas|migrations
  image:
    registry: ghcr.io
    repository: chainloop-dev/chainloop/{control-plane,cas,migrations}
    tag: "v0.96.13"
migmartri commented 1 week ago

This is another change we made in our chart, before, image.repository contained both the registry and the image repository.

Now, as @javirln said, we also have registry. So you need to split your old image.repository between image.registry and image.repository

also note that instead of overriding each controlplane|cas|migration.image.registry you can now use global.imageRegistry to set it only once.

hanygirgis commented 1 week ago

I tried again but it still doesn't deploy the main 3 images. I gave it a shot on another enviornment which I believe can get the images directly (thus I don't need to override with the local registry), but it still doesn't install them. This is what I'm now using:

helm install chainloop oci://ghcr.io/chainloop-dev/charts/chainloop --values ./values-noingress.yaml \
    --set global.openshift=true \
    --set development=true \
    --set controlplane.auth.oidc.url=https://hostname/realms/Chainloop \
    --set controlplane.auth.oidc.clientID=chainloop \
    --set controlplane.auth.oidc.clientSecret=REDACTED \
    --set controlplane.auth.oidc.externalURL=https://hostname \
    --set controlplane.tls.existingSecret=grpc-tls-secret \
    --set cas.tls.existingSecret=grpc-tls-secret
javirln commented 1 week ago

Could you please paste the error that you get? Or describe a failing pod, the events that happened to it? Just to try and troubleshoot what could be the problem.

What's probably missing is the pull secret to pull the images from the registry.

hanygirgis commented 1 week ago

There are no errors, but when I look at the installed pods after the helm install, I only see the following :

chainloop-dex-6d768487b8 chainloop-postgresql-0 chainloop-vault-injector-76c4dcfc6c-bc44d chainloop-vault-server-0

hanygirgis commented 1 week ago

OK, I found errors that happened during controleplane and cas pod installation, apparently has to do with OpenShift compatibility :

Error creating: pods "chainloop-controlplane-6d58db464b-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "pipelines-scc": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group, pod.metadata.annotations[container.seccomp.security.alpha.kubernetes.io/controlplane]: Forbidden: seccomp may not be set, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "noobaa": Forbidden: not usable by user or serviceaccount, provider "noobaa-endpoint": Forbidden: not usable by user or serviceaccount, provider "noobaa-db": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "elasticsearch-scc": Forbidden: not usable by user or serviceaccount, provider "logging-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "ocs-metrics-exporter": Forbidden: not usable by user or serviceaccount, provider "rook-ceph": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "rook-ceph-csi": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

javirln commented 1 week ago

Ok that's something completely unexpected. Let me research on this and come back again. Sorry for the inconveniences!

javirln commented 1 week ago

@hanygirgis just pinging you to let you know that I've found the issue and I'm applying a fix. I'm testing everything works on a local OpenShift environment at the moment.

javirln commented 1 week ago

A new version of the chart is out and the issue with OpenShift should be fixed.

hanygirgis commented 1 week ago

Latest version works now, thanks a lot.