berops / claudie

Cloud-agnostic managed Kubernetes
https://docs.claudie.io/
Apache License 2.0
649 stars 40 forks source link

Bug: builder restarted while constructing the clusters #1512

Open JKBGIT1 opened 2 months ago

JKBGIT1 commented 2 months ago

Current Behaviour

The builder restarted during the clusters' constructions.

NAME                                READY   STATUS    RESTARTS        AGE
ansibler-599cb5b7b7-25hzm           1/1     Running   0               6h14m
builder-657499cc75-zsqw2            1/1     Running   1 (5h49m ago)   6h14m
claudie-operator-7b88589ff9-lhwlf   1/1     Running   0               6h14m

As you can see on the logs below it finished building the GCP cluster in the test-set-no1. It was supposed to start building the OCI cluster in the test-set-no1 right after it finished the GCP cluster. But it didn't and waited for about 2 hours.

...
2024-09-19T10:10:17Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:94 > Finished processing task "1b77fa23-c7cd-48b3-ac0a-802de7ce57ff" for cluster "ts1-gcp-clus
ter-test-set-no1" config "claudie-9cb8ac3-2971-test-set1" module=builder
2024-09-19T10:45:07Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:ANSIBLER status:IN_PROGRESS module=builder
2024-09-19T10:45:07Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_TERRAFORMER status:IN_PROGRESS description:"destroying infrastructure" module=builder
2024-09-19T10:45:07Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:62 > Calling DestroyInfrastructure on Terraformer cluster=hybrid-cluster-test-set-no-
5-zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:66 > DestroyInfrastructure on Terraformer finished successfully cluster=hybrid-cluste
r-test-set-no-5-zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_TERRAFORMER status:IN_PROGRESS module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_KUBER status:IN_PROGRESS description:"deleting kubeconfig secret" module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:137 > Calling DeleteKubeconfig on Kuber cluster=hybrid-cluster-test-set-no-5-zkqpq84 mo
dule=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_KUBER status:IN_PROGRESS description:"deleting cluster metadata secret" module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:144 > Calling DeleteClusterMetadata on kuber cluster=hybrid-cluster-test-set-no-5-zkqpq84 
module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:148 > DeleteKubeconfig on Kuber finished successfully cluster=hybrid-cluster-test-set-no-5-
zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:KUBER status:IN_PROGRESS module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:52 > successfully processed task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cl
uster-test-set-no-5" for config "claudie-9cb8ac3-2971-test-set5" module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:60 > updating current state for cluster "hybrid-cluster-test-set-no-5" for config "claudie-9cb8ac3-2971-
test-set5" task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:77 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for
 config "claudie-9cb8ac3-2971-test-set5" with status: DONE module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:94 > Finished processing task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-clust
er-test-set-no-5" config "claudie-9cb8ac3-2971-test-set5" module=builder
2024-09-19T11:54:20Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:133 > [task "371c8509-6fb1-459f-b700-cc485da1a4a8"] Update operation "ts1-oci-cluster-test-set-no1" from
 config "claudie-9cb8ac3-2971-test-set1" module=builder
2024-09-19T11:54:20Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "371c8509-6fb1-459f-b700-cc485da1a4a8" for cluster "ts1-oci-cluster-test-set-no1" for c
onfig "claudie-9cb8ac3-2971-test-set1" with state: stage:TERRAFORMER status:IN_PROGRESS description:"building infrastructure" module=builder
2024-09-19T11:54:20Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:27 > Calling BuildInfrastructure on Terraformer cluster=ts1-oci-cluster-test-set-no1-
050cz7r module=builder project=claudie-9cb8ac3-2971-test-set1
2024-09-19T11:55:39Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:32 > BuildInfrastructure on Terraformer finished successfully cluster=ts1-oci-cluster
-test-set-no1-050cz7r module=builder project=claudie-9cb8ac3-2971-test-set1
...

This resulted in the stuck building infrastructure for the OCI cluster in the test-set-no1.

Status:
  Clusters:
    ts1-aws-cluster-test-set-no1:
    Message:  Finished successfully
    Phase:  NONE
    State:  DONE
    ts1-azr-cluster-test-set-no1:
    Message:  Finished successfully
    Phase:  NONE
    State:  DONE
    ts1-gcp-cluster-test-set-no1:
    Message:  Finished successfully
    Phase:  NONE
    State:  DONE
    ts1-htz-cluster-test-set-no1:
    Message:  Finished successfully
    Phase:  NONE
    State:  DONE
    ts1-oci-cluster-test-set-no1:
    Message:  building infrastructure
    Phase:  TERRAFORMER
    State:  IN_PROGRESS
  State:        IN_PROGRESS
Events:         <none>

Besides that, the e2e pipeline failed because it took too long to finish the test sets.

2024-09-19T09:06:27Z ERR claudie_test.go:125 > Error in test sets test-set3  error="error while monitoring manifest 1.yaml from test set test-set3 : test took too long... Aborting after 8000 seconds" module=testing-framework
2024-09-19T09:11:19Z ERR claudie_test.go:125 > Error in test sets test-set2  error="error while monitoring manifest 1.yaml from test set test-set2 : test took too long... Aborting after 8000 seconds" module=testing-framework
2024-09-19T09:13:10Z ERR claudie_test.go:147 > Error in test sets autoscaling-1 error="error while performing additional test for manifest 1.yaml from autoscaling-1 : test took too long... Aborting after 8000 seconds" module=testing-framework
panic: test timed out after 3h0m0s
    running tests:
        TestClaudie (3h0m0s)

Expected Behaviour

At first builder should finish the process of building the cluster. Then it can restart.

Steps To Reproduce

I have no idea.

Anything else to note

Nothing.