devsisters / shardcake

Sharding and location transparency for Scala
https://devsisters.github.io/shardcake/
Apache License 2.0
379 stars 29 forks source link

AWS Discovery #9

Open guersam opened 1 year ago

guersam commented 1 year ago

Thanks for open-sourcing shardcake, @ghostdogpr and Devsisters!

I'd like to port a running akka cluster to shardcake, and the largest blocker is the service discovery.

We have an Akka cluster that is running on ECS Fargate instead of Kubernetes so that we're using ECS discovery module provided by Akka Management:

https://doc.akka.io/docs/akka-management/current/discovery/aws.html

ghostdogpr commented 1 year ago

This can be done easily by providing an implementation of PodsHealth using ECS API, see https://devsisters.github.io/shardcake/docs/customization.html#health

The Kubernetes one is only a few lines long: https://github.com/devsisters/shardcake/blob/series/2.x/health-k8s/src/main/scala/com/devsisters/shardcake/K8sPodsHealth.scala

ghostdogpr commented 1 year ago

Btw if someone implements it, we'll happily accept the contribution!

thiloplanz commented 1 year ago

Conceptual question about this: Are these infra-specific health-checks in any way superior the built-in "ping" health check? That one is also very reliable (right?), works out of the box and does not require to set up access permissions to infra API for your application.

ghostdogpr commented 1 year ago

In case of network issue, the ping might fail even though the pod is actually alive processing messages. However the infra (like Kubernetes) knows if the pod is alive or not because it's in charge of its lifecycle. Basically we rely on the built-in logic of the infrastructure to handle things like cluster split, etc.

thiloplanz commented 1 year ago

Hmm. I guess that can cut both ways. If the ping fails because of network issues, payload messages might fail for the same reason, even though Kubernetes knows that the pod is alive. In that scenario the ping healthcheck is closer to "proof in the pudding". 🤔

ghostdogpr commented 1 year ago

We don't want to rebalance as we're not sure the pod is gone. Otherwise you might end up with the same shard on 2 different pods.

grouzen commented 7 months ago

Hello! I'm wondering how to test it during development, considering that localstack's EKS module is available in the Pro version only.

ghostdogpr commented 7 months ago

Hello! I'm wondering how to test it during development, considering that localstack's EKS module is available in the Pro version only.

I don't have a great solution for that, we tested the k8s one in a real environment...