Closed timbrd closed 3 years ago
I would also suggest reviewing the Namespaces section of the documentation that detail the various namespaces and required privileges.
- Have you or someone on your system previously installed the Postgres Operator with a different method?
I have not used a different installation method to install the operator before, but I had to delete the olm install plan and crds since I have installed them into the wrong namespace.
- Which namespace mode did you select?
I didn't select any namespace mode. I did only install the operator using the manifests as described in the mentioned documentation:
kubectl -n "$PGO_OPERATOR_NAMESPACE" create -f- <<YAML
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: postgresql
spec:
targetNamespaces: ["$PGO_OPERATOR_NAMESPACE"]
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: postgresql
spec:
name: postgresql
channel: stable
source: operatorhubio-catalog
sourceNamespace: olm
startingCSV: postgresoperator.v4.6.1
YAML
I would also suggest reviewing the Namespaces section of the documentation that detail the various namespaces and required privileges.
Thanks, I will check that. But isn't the operator supposed to prepare the namespace and create the basic roles?
Thanks, I will check that. But isn't the operator supposed to prepare the namespace and create the basic roles?
Yes, it does -- in fact, that is what that step is attempting to do. However, if your OpenShift cluster has certain permissions locked down, it may take a little extra effort.
Perhaps another question I should have asked first -- have you attempted to create a Postgres cluster after installing? That error may be the Operator just reporting that it does not have those permissions yet, and they are subsequently created after the reconciliation loop finishes.
Hi @jkatz @timbrd I think that's the solution.
On Linux: mkdir -p $HOME/odev/src/github.com/crunchydata $HOME/odev/bin $HOME/odev/pkg cd $HOME/odev/src/github.com/crunchydata git clone https://github.com/CrunchyData/postgres-operator.git cd postgres-operator git checkout v4.6.1
/odev/src/github.com/crunchydata/postgres-operator/deploy/add-targeted-namespace.sh
Among others in this file is:
# create RBAC
$PGO_CMD -n $1 delete --ignore-not-found sa pgo-backrest pgo-default pgo-target
$PGO_CMD -n $1 delete --ignore-not-found role pgo-backrest-role pgo-target-role
$PGO_CMD -n $1 delete --ignore-not-found rolebinding pgo-backrest-role-binding pgo-target-role-binding
cat $PGO_CONF_DIR/pgo-configs/pgo-default-sa.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-target-sa.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-target-role.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-target-role-binding.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | sed 's/{{.OperatorNamespace}}/'"$PGO_OPERATOR_NAMESPACE"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-backrest-sa.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-backrest-role.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
cat $PGO_CONF_DIR/pgo-configs/pgo-backrest-role-binding.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f -
I also am experiencing this - in my case with a fully declarative deployment: 1) subscribe the 4.6.1 operator from OLM. 2) create a 4.6.1 pgcluster
custom resource. Same exact error in the logs regarding pgo-target-role
. @jkatz the answer above would seem to indicate a requirement to run a shell script as part of the install. Our install needs to be fully declarative. Do you think the fully declarative solution as I've described should be creating this pgo-target-role
? It appears not to be...
I should add that we have been successfully using 4.5.1 up until now. The only changes in our configuration are the version changes from 4.5.1 to 4.6.1 in the OLM subscription manifest and the pgcluster
manifest...
@jkatz as a follow-up, I picked out and ran exactly two statements from @AleksanderRoszig's comment - after our fully declarative deployment:
cat $PGO_CONF_DIR/pgo-configs/pgo-target-role.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | $PGO_CMD -n $1 create -f - cat $PGO_CONF_DIR/pgo-configs/pgo-target-role-binding.json | sed 's/{{.TargetNamespace}}/'"$1"'/' | sed 's/{{.OperatorNamespace}}/'"$PGO_OPERATOR_NAMESPACE"'/' | $PGO_CMD -n $1 create -f -
... and the postgres operator errors immediately stopped. This would seem to indicate an error in the OLM subscription - the OLM CSV does not appear to define all roles and bindings needed by the operator.
@aceeric Can you please provide a bit more information:
Our install needs to be fully declarative.
I'd also be curious if you could elaborate on the reason why it needs to be fully declarative. What is the use case you are trying to solve?
I don't see anything immediately in the diff between v4.5.1 and v4.6.1 that strikes me that this is a bug, though I don't see anything to convince me that it is not a bug. I'll treat it as one for now and see if there is anything either programmatic or OLM-based that can fix this.
@jkatz - our environment is air-gapped running Kubernetes. Therefore we have our own OLM registry serving the operator cloned from https://github.com/operator-framework/community-operators/tree/master/upstream-community-operators/postgresql. We use the ubi8-4.6.1 operator image with these manifests.
This approach has been solid since 4.4.1. So this issue exhibits on the move from 4.5.1 to 4.6.1 (we skipped 4.6.0 for no particular reason.) However, it is my belief that this could be reproduced in a non-air-gapped environment using these same manifests. Regarding the role in question - this is always an empty namespace that we are installing into.
And finally - our install needs to be fully declarative because it is deployed directly from source control using GitOps tooling.
I can see that the pgo-backrest-role
and pgo-pg-role
roles (and bindings) are created by the 4.6.1 operator, but not the pgo-target-role
and associated binding...
@aceeric Thanks for the additional info. That should help us drill into what's going on.
@timbrd @AleksanderRoszig @aceeric This is indeed a bug and the fix will be applied to 4.6.2 in the coming days.
For now, the work around is what is suggested, which is to manually run the add-targeted-namespace.sh script.
Thanks for reporting!
Describe the bug After successfully installing pgo 4.6.1 using the operator lifecycle manager as described here, I have found the following error messages in the operator logs:
The deployment is running:
Do I have to create the role manually?
To Reproduce Steps to reproduce the behavior:
kubectl logs postgres-operator-6df4f5746c-jq8ss operator -n pgo