Pega88 / chainlink-gcp

Hosting a Chainlink node on Kubernetes using Terraform
28 stars 10 forks source link

gke-cluster TF task creates a cluster empty node pool #8

Open nikkatalnikov opened 2 years ago

nikkatalnikov commented 2 years ago

Hi there,

When trying to create GKE cluster, I see the following log at the end:

google_container_cluster.gke-cluster: Creation complete after 6m10s [id=projects/***/locations/europe-west1-b/clusters/chainlink]

However, I observe the cluster nodes being automatically deleted after seemingly successful creation. The final result is like this:

Screenshot 2021-09-19 at 17 13 52

after thay I got

│ Error: namespaces "chainlink" not found
│ 
│   with kubernetes_secret.password-credentials,
│   on chainlink-node.tf line 85, in resource "kubernetes_secret" "password-credentials":
│   85: resource "kubernetes_secret" "password-credentials" {
│ 
╵
╷
│ Error: Failed to create deployment: namespaces "chainlink" not found
│ 
│   with kubernetes_deployment.chainlink-node,
│   on chainlink-node.tf line 99, in resource "kubernetes_deployment" "chainlink-node":
│   99: resource "kubernetes_deployment" "chainlink-node" {
│ 

I tried both master and feature/tf-upgrade versions.

Could it be IAM issue?

Thank you!

nikkatalnikov commented 2 years ago

The tooltip on UI says

The number of nodes is estimated by the number of Compute VM instances because the Kubernetes control plane did not respond, possibly due to a pending upgrade or missing IAM permissions.

The number of nodes in a cluster should match the number of Compute VM instances, except for:
A temporary skew during resize or upgrade
Uncommon configurations in which nodes or instances were manipulated directly with Kubernetes and/or Compute APIs
nikkatalnikov commented 2 years ago

ok, seems like few appropriate depends_on fix the problems.

for chainlink image v 0.9.10 pods fail to start:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  20m                 default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/network-unavailable: }, that the pod didn't tolerate.
  Normal   Scheduled         20m                 default-scheduler  Successfully assigned chainlink/chainlink-75dd5b6bdf-g8l87 to gke-chainlink-main-nodes-964c9c9f-smjb
  Warning  FailedMount       20m                 kubelet            MountVolume.SetUp failed for volume "api-volume" : failed to sync secret cache: timed out waiting for the condition
  Normal   Pulling           20m                 kubelet            Pulling image "smartcontract/chainlink:0.9.10"
  Normal   Pulled            20m                 kubelet            Successfully pulled image "smartcontract/chainlink:0.9.10" in 17.493417828s
  Normal   Created           18m (x5 over 20m)   kubelet            Created container chainlink-node
  Normal   Started           18m (x5 over 20m)   kubelet            Started container chainlink-node
nikkatalnikov commented 2 years ago

I also observe an error logs in the pods:

2021-09-20T00:07:16Z [FATAL] Unable to initialize ORM: pq: syntax error at or near "INCLUDE"
error running migrations
github.com/smartcontractkit/chainlink/core/store/migrations.MigrateTo
    /chainlink/core/store/migrations/migrate.go:549
github.com/smartcontractkit/chainlink/core/store/migrations.Migrate
    /chainlink/core/store/migrations/migrate.go:523
github.com/smartcontractkit/chainlink/core/store.initializeORM.func1
    /chainlink/core/store/store.go:223
github.com/smartcontractkit/chainlink/core/store/orm.(*ORM).RawDB
    /chainlink/core/store/orm/orm.go:1443
nikkatalnikov commented 2 years ago

ok, with postgres 13.3 chainlink 0.9.10 works.

I won't make a PR as it looks to be redundant to https://github.com/Pega88/chainlink-gcp/pull/7 - most things are the same.

@Pega88 feel free to close an issue once that PR is merged.

blackramit commented 2 years ago

All the exact issues I ran into Nik. I have put in a commit to Niels for an update. Also, I wasn't able to get anything above 0.9.10 to work so that appears to be the newest Chainlink release that can be used with this method.

If you and Niels are interested, I ran into this really well done deployment using Helm from Leo Vigna at Vulcan. I'm just learning to work with K8's and I'm using Chainlink as my app platform on GKE, Really cool stuff that folks are doing and I can't thank people like Niels and Leo enough for sharing their knowledge!