Pega88 / chainlink-gcp

Hosting a Chainlink node on Kubernetes using Terraform
28 stars 10 forks source link

Namespace issue in K8 node deployment #4

Open blackramit opened 2 years ago

blackramit commented 2 years ago

Hey Pega88 (Niels), thanks much for all the work you did on this deployment manifest. Awesome work! I ran into an issue with what I believe is a race condition with the chainlink namespace getting started on the K8 cluster. Did you run into this issue and if so, did you ever get a workaround?

google_container_cluster.gke-cluster: Creation complete after 5m27s [id=projects/chainlink-test-324713/locations/us-central1-c/clusters/chainlink-cluster] kubernetes_namespace.chainlink: Creating... kubernetes_secret.password-credentials: Creating... kubernetes_service.chainlink_service: Creating... kubernetes_config_map.chainlink-env: Creating... kubernetes_secret.api-credentials: Creating... kubernetes_config_map.postgres: Creating... kubernetes_service.postgres: Creating... kubernetes_deployment.chainlink-node: Creating... kubernetes_stateful_set.postgres: Creating... kubernetes_namespace.chainlink: Creation complete after 0s [id=chainlink] ╷ │ Error: namespaces "chainlink" not found │ │ with kubernetes_config_map.chainlink-env, │ on chainlink-node.tf line 28, in resource "kubernetes_config_map" "chainlink-env": │ 28: resource "kubernetes_config_map" "chainlink-env" { │ ╵

blackramit commented 2 years ago

For Pega88 (Niels) and any others who venture here. I was able to get things working with a couple tweaks;

blackramit commented 2 years ago

I noticed something about GCP/GKE that could be what was going on above. Even after deleting a project, the platform seems to hold onto the namespace. But what it has won't be pointing to the new project you are working with. I believe this may be the way to fix that. Notice it stays in a terminating state for quite a while;

devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl get namespace
NAME              STATUS   AGE
chainlink         Active   3h12m
default           Active   3h16m
kube-node-lease   Active   3h16m
kube-public       Active   3h16m
kube-system       Active   3h16m
devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl delete namespace chainlink
namespace "chainlink" deleted 
devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl get namespace
NAME              STATUS        AGE
chainlink         Terminating   3h31m
default           Active        3h35m
kube-node-lease   Active        3h35m
kube-public       Active        3h35m
kube-system       Active        3h35m
Pega88 commented 2 years ago

Thanks for flagging, I'll take some time in the week to update the entire setup

blackramit commented 2 years ago

Thanks for flagging, I'll take some time in the week to update the entire setup

Hey Niels, I had to add a bunch of dependencies (depends_on=) to get the various build segments to run in the right order. I'll submit the code after I run it a few times to verify it. I now have three solid nodes up on GKE running v0.9.10 of the chainlink code.

Pega88 commented 2 years ago

can you have a look at #6 to see of this helps? Still need updating the CL image and add the timeouts, haven't tried that. feel free to PR though!

Pega88 commented 2 years ago

I noticed something about GCP/GKE that could be what was going on above. Even after deleting a project, the platform seems to hold onto the namespace. But what it has won't be pointing to the new project you are working with. I believe this may be the way to fix that. Notice it stays in a terminating state for quite a while;

devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl get namespace
NAME              STATUS   AGE
chainlink         Active   3h12m
default           Active   3h16m
kube-node-lease   Active   3h16m
kube-public       Active   3h16m
kube-system       Active   3h16m
devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl delete namespace chainlink
namespace "chainlink" deleted 
devadmin@ThunderCloud:/mnt/e/Development/chainlink-gcp$ kubectl get namespace
NAME              STATUS        AGE
chainlink         Terminating   3h31m
default           Active        3h35m
kube-node-lease   Active        3h35m
kube-public       Active        3h35m
kube-system       Active        3h35m

this is your local ~/.kube/config, which is not deleted if you delete the google cloud environment. so your local tooling still thinks its there. link here That said, it's weird is successfully deleting a namespace of a cluster that should not be reachable anymore

Pega88 commented 2 years ago

can you have a look at #6 to see of this helps? Still need updating the CL image and add the timeouts, haven't tried that. feel free to PR though!

updated CL version as well with your snippet - haven't had time to fully run it yet. LMK if it works for you?

blackramit commented 2 years ago

Hi Niels... I'll put in a PR. I just ran this code and it works well. Like the idea of setting up the Eth Requirements via variables.

On Fri, Sep 10, 2021 at 2:03 AM N. @.***> wrote:

can you have a look at #6 https://github.com/Pega88/chainlink-gcp/pull/6 to see of this helps? Still need updating the CL image and add the timeouts, haven't tried that. feel free to PR though!

updated CL version as well with your snippet - haven't had time to fully run it yet. LMK if it works for you?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Pega88/chainlink-gcp/issues/4#issuecomment-916713231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2EWO4YBMKITIZPLSLDLATUBG3U7ANCNFSM5DJJ25GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

blackramit commented 2 years ago

And I couldn't get Chainlink v0.10.14 to run via TF/kubectl because it installs v0.9.10 and then tries to do an upgrade. I'm going to open a ticket with them to see what the plan is. Maybe when they get v0.10 to a RC, they will create a dedicated package.

On Mon, Sep 13, 2021 at 11:38 AM Douglas Young @.***> wrote:

Hi Niels... I'll put in a PR. I just ran this code and it works well. Like the idea of setting up the Eth Requirements via variables.

On Fri, Sep 10, 2021 at 2:03 AM N. @.***> wrote:

can you have a look at #6 https://github.com/Pega88/chainlink-gcp/pull/6 to see of this helps? Still need updating the CL image and add the timeouts, haven't tried that. feel free to PR though!

updated CL version as well with your snippet - haven't had time to fully run it yet. LMK if it works for you?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Pega88/chainlink-gcp/issues/4#issuecomment-916713231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2EWO4YBMKITIZPLSLDLATUBG3U7ANCNFSM5DJJ25GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

blackramit commented 2 years ago

Sorry, didn't realize you had committed changes already. I'll reverted my changes and will work with your current code to validate.

On Mon, Sep 13, 2021 at 11:41 AM Douglas Young @.***> wrote:

And I couldn't get Chainlink v0.10.14 to run via TF/kubectl because it installs v0.9.10 and then tries to do an upgrade. I'm going to open a ticket with them to see what the plan is. Maybe when they get v0.10 to a RC, they will create a dedicated package.

On Mon, Sep 13, 2021 at 11:38 AM Douglas Young @.***> wrote:

Hi Niels... I'll put in a PR. I just ran this code and it works well. Like the idea of setting up the Eth Requirements via variables.

On Fri, Sep 10, 2021 at 2:03 AM N. @.***> wrote:

can you have a look at #6 https://github.com/Pega88/chainlink-gcp/pull/6 to see of this helps? Still need updating the CL image and add the timeouts, haven't tried that. feel free to PR though!

updated CL version as well with your snippet - haven't had time to fully run it yet. LMK if it works for you?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Pega88/chainlink-gcp/issues/4#issuecomment-916713231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2EWO4YBMKITIZPLSLDLATUBG3U7ANCNFSM5DJJ25GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.