Open ghost opened 1 year ago
Upon creating a new ConsensusStore resource, it appears that no information about the StatefulSet is used to inform on the ConsensusStore's status.
If we create this ConsensusStore as an example:
cat <<EOF | kubectl apply -f -
apiVersion: consensus.atomix.io/v1beta1
kind: ConsensusStore
metadata:
name: my-consensus-store
spec:
replicas: 3
groups: 30
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
storageClass: "standard"
resources:
requests:
storage: 2Gi
EOF
The beginning of its life looks something like this.
% kubectl get ConsensusStores,StatefulSets
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store
NAME READY AGE
statefulset.apps/my-consensus-store 0/3 1s
% kubectl get ConsensusStores,StatefulSets
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store
NAME READY AGE
statefulset.apps/my-consensus-store 2/3 15s
% kubectl get ConsensusStores,StatefulSets
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store NotReady
NAME READY AGE
statefulset.apps/my-consensus-store 3/3 21s
% kubectl get ConsensusStores,StatefulSets
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store Ready
NAME READY AGE
statefulset.apps/my-consensus-store 3/3 2m58s
When I submitted https://github.com/atomix/atomix.github.io/pull/26 I saw what appears to be unhandled conditions during startup as well:
2023-01-24T14:47:17.815Z INFO github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:440 Reconcile raft protocol service
2023-01-24T14:47:17.815Z INFO github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:485 Reconcile raft protocol headless
service
2023-01-24T14:47:17.815Z ERROR github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:184 Pod "my-consensus-store-0" not f
oundReconcile MultiRaftCluster
github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1.(*MultiRaftClusterReconciler).Reconcile
github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1/cluster.go:184
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:234
Also the controller does not seem to update it's status when it can't apply changes to the StatefulSet like this:
pilot@cove reflow % kubectl get ConsensusStores,StatefulSets
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store Ready
NAME READY AGE
statefulset.apps/my-consensus-store 3/3 13m
% cat <<EOF | kubectl apply -f -
apiVersion: consensus.atomix.io/v1beta1
kind: ConsensusStore
metadata:
name: my-consensus-store
spec:
replicas: 3
groups: 30
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
storageClass: "standard"
resources:
requests:
storage: 2Gi
EOF
consensusstore.consensus.atomix.io/my-consensus-store configured
% kubectl get ConsensusStores
NAME STATUS
my-consensus-store Ready
The controller actually can't update the field on the StatefulSet, but I don't see any errors in the controller logs from it attempting to apply an update. Only this is produced:
2023-01-24T17:45:50.169Z INFO github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/store.go:88 Reconcile ConsensusStore
2023-01-24T17:45:50.169Z INFO github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/store.go:99 Reconcile raft protocol stateful set
What happens when you try to make changes to the volumeClaimTemplate with kubectl manually would be this error:
The StatefulSet "web" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
desired_replicas != available_replicas
. The ConsensusStore could possibly be defined as at least Pending.
% kubectl get ConsensusStore,StatefulSet
NAME STATUS
consensusstore.consensus.atomix.io/my-consensus-store
NAME READY AGE statefulset.apps/my-consensus-store 0/3 83m
2. Produce information somewhere when failing to update the StatefulSet.
Updating the status, creating an event, and having an error log entry for this is probably enough to help users out.
My initial comment needs improvement, I'll post an update with better details shortly.