apache / pekko-samples

Apache Pekko Sample Projects
https://pekko.apache.org/
Apache License 2.0
42 stars 21 forks source link

Issue 38: Migrate akka cluster k8s scala #39

Closed sam-byng closed 1 year ago

sam-byng commented 1 year ago

See description for related PR: https://github.com/apache/incubator-pekko-samples/pull/38

FIXED: The same failure is happening here:

[2023-04-20 14:01:57,443] [INFO] [org.apache.pekko.cluster.Cluster] [] [Appka-pekko.actor.default-dispatcher-3] - Cluster Node [pekko://Appka@10.244.0.27:17355] - Started up successfully MDC: {pekkoUid=-5630153020963612054, sourceThread=main, sourceActorSystem=Appka, pekkoAddress=pekko://Appka@10.244.0.27:17355, pekkoSource=Cluster(pekko://Appka), pekkoTimestamp=14:01:57.442UTC}
[2023-04-20 14:01:59,326] [ERROR] [org.apache.pekko.cluster.Cluster] [] [Appka-pekko.actor.default-dispatcher-5] - Cluster Node [pekko://Appka@10.244.0.27:17355] - crashed, [pekko://Appka/system/cluster/core/daemon: exception during creation, root cause message: [pekko.cluster.sbr.SplitBrainResolverProvider]] - shutting down... MDC: {pekkoUid=-5630153020963612054, sourceThread=Appka-pekko.actor.internal-dispatcher-7, sourceActorSystem=Appka, pekkoAddress=pekko://Appka@10.244.0.27:17355, pekkoSource=Cluster(pekko://Appka), pekkoTimestamp=14:01:58.738UTC}
org.apache.pekko.actor.ActorInitializationException: pekko://Appka/system/cluster/core/daemon: exception during creation, root cause message: [pekko.cluster.sbr.SplitBrainResolverProvider]
        at org.apache.pekko.actor.ActorInitializationException$.apply(Actor.scala:206)
        at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:679)
        at org.apache.pekko.actor.ActorCell.invokeAll$1(ActorCell.scala:523)
        at org.apache.pekko.actor.ActorCell.systemInvoke(ActorCell.scala:545)
        at org.apache.pekko.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:305)
        at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:240)
        at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: org.apache.pekko.ConfigurationException: Could not create cluster downing provider [pekko.cluster.sbr.SplitBrainResolverProvider]
        at org.apache.pekko.cluster.DowningProvider$$anonfun$load$1.applyOrElse(DowningProvider.scala:37)
        at org.apache.pekko.cluster.DowningProvider$$anonfun$load$1.applyOrElse(DowningProvider.scala:36)
        at scala.util.Failure.recover(Try.scala:233)
        at org.apache.pekko.cluster.DowningProvider$.load(DowningProvider.scala:36)
        at org.apache.pekko.cluster.Cluster.downingProvider$lzycompute(Cluster.scala:144)
        at org.apache.pekko.cluster.Cluster.downingProvider(Cluster.scala:142)
        at org.apache.pekko.cluster.ClusterCoreDaemon.preStart(ClusterDaemon.scala:433)
        at org.apache.pekko.actor.Actor.aroundPreStart(Actor.scala:558)
        at org.apache.pekko.actor.Actor.aroundPreStart$(Actor.scala:558)
        at org.apache.pekko.cluster.ClusterCoreDaemon.aroundPreStart(ClusterDaemon.scala:328)
        at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:654)
        ... 10 common frames omitted
Caused by: java.lang.ClassNotFoundException: pekko.cluster.sbr.SplitBrainResolverProvider
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
        at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Unknown Source)
        at org.apache.pekko.actor.ReflectiveDynamicAccess.$anonfun$getClassFor$1(ReflectiveDynamicAccess.scala:39)
        at scala.util.Try$.apply(Try.scala:210)
        at org.apache.pekko.actor.ReflectiveDynamicAccess.getClassFor(ReflectiveDynamicAccess.scala:38)
        at org.apache.pekko.actor.ReflectiveDynamicAccess.createInstanceFor(ReflectiveDynamicAccess.scala:57)
        at org.apache.pekko.cluster.DowningProvider$.load(DowningProvider.scala:35)
        ... 17 common frames omitted
pjfanning commented 1 year ago

Class package seems wrong. org.apache missing

sam-byng commented 1 year ago

Issue fixed: downing-provider-class updated

successful local test:

$ kubectl logs appka-85f68c9bf4-g64zc | grep MemberUp
[2023-04-25 15:18:39,874] [INFO] [org.apache.pekko.sample.cluster.kubernetes.DemoApp$] [] [appka-pekko.actor.default-dispatcher-11] - MemberEvent: MemberUp(Member(pekko://appka@10.244.0.109:17355, Up)) MDC: {sourceActorSystem=appka, pekkoAddress=pekko://appka@10.244.0.109:17355, pekkoSource=pekko://appka/user/listener}
[2023-04-25 15:18:43,974] [INFO] [org.apache.pekko.sample.cluster.kubernetes.DemoApp$] [] [appka-pekko.actor.default-dispatcher-14] - MemberEvent: MemberUp(Member(pekko://appka@10.244.0.110:17355, Up)) MDC: {sourceActorSystem=appka, pekkoAddress=pekko://appka@10.244.0.109:17355, pekkoSource=pekko://appka/user/listener}
[2023-04-25 15:18:43,975] [INFO] [org.apache.pekko.sample.cluster.kubernetes.DemoApp$] [] [appka-pekko.actor.default-dispatcher-14] - MemberEvent: MemberUp(Member(pekko://appka@10.244.0.111:17355, Up)) MDC: {sourceActorSystem=appka, pekkoAddress=pekko://appka@10.244.0.109:17355, pekkoSource=pekko://appka/user/listener}
sam-byng commented 1 year ago

Issue: Timeout error in CI test on a non-changed directory: pekko-sample-distributed-data-scala

FIXED:I've made us more resilient to this by increasing the timers

((error:

[JVM-3] [WARN] [04/25/2023 15:53:24.407] [ReplicatedMetricsSpec-pekko.actor.default-dispatcher-23] [pekko://ReplicatedMetricsSpec/system/ddataReplicator] received dead letter from Actor[pekko://ReplicatedMetricsSpec@localhost:34399/system/ddataReplicator#94638895]: DeltaPropagation(UniqueAddress(pekko://ReplicatedMetricsSpec@localhost:34399,8611969283667454892),false,Map(usedHeap -> Delta(DataEnvelope(PutDeltaOp(AddDeltaOp(ORSet(localhost:34399)),(localhost:34399,LWWRegister(27707792)),LWWMapTag),Map(),VersionVector(UniqueAddress(pekko://ReplicatedMetricsSpec@localhost:34399,8611969283667454892) -> 6, UniqueAddress(pekko://ReplicatedMetricsSpec@localhost:42099,-3404745728445011216) -> 2, UniqueAddress(pekko://ReplicatedMetricsSpec@localhost:42271,-473137233004359005) -> 6)),6,6)))
[JVM-3] [INFO] [pekkoDeadLetter][04/25/2023 15:53:24.405] [ReplicatedMetricsSpec-pekko.actor.default-dispatcher-11] [pekko://ReplicatedMetricsSpec/system/ddataReplicator] Message [org.apache.pekko.cluster.ddata.Replicator$Internal$DeltaPropagation] from Actor[pekko://ReplicatedMetricsSpec@localhost:34399/system/ddataReplicator#94638895] to Actor[pekko://ReplicatedMetricsSpec/system/ddataReplicator] was not delivered. [2] dead letters encountered. If this is not an expected behavior then Actor[pekko://ReplicatedMetricsSpec/system/ddataReplicator] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'pekko.log-dead-letters' and 'pekko.log-dead-letters-during-shutdown'.
[JVM-2] [2023-04-25 15:53:24,584] [INFO] [org.apache.pekko.cluster.Cluster] [pekkoMemberChanged] [ReplicatedMetricsSpec-pekko.actor.default-dispatcher-12] - Cluster Node [pekko://ReplicatedMetricsSpec@localhost:34399] - Leader is removing confirmed Exiting node [pekko://ReplicatedMetricsSpec@localhost:42099]
[JVM-2] [INFO] [pekkoMemberChanged][04/25/2023 15:53:24.[583](https://github.com/apache/incubator-pekko-samples/pull/39/checks#step:10:584)] [ReplicatedMetricsSpec-pekko.actor.internal-dispatcher-3] [Cluster(pekko://ReplicatedMetricsSpec)] Cluster Node [pekko://ReplicatedMetricsSpec@localhost:34399] - Leader is removing confirmed Exiting node [pekko://ReplicatedMetricsSpec@localhost:42099]
[JVM-3] - must replicate metrics *** FAILED ***
[JVM-3]   java.lang.AssertionError: Timeout (3 seconds) during expectMessageClass waiting for class sample.distributeddata.ReplicatedMetrics$UsedHeap
[JVM-3]   at org.apache.pekko.actor.testkit.typed.internal.TestProbeImpl.assertFail(TestProbeImpl.scala:410)
[JVM-3]   at org.apache.pekko.actor.testkit.typed.internal.TestProbeImpl.expectMessageClass_internal(TestProbeImpl.scala:250)
[JVM-3]   at org.apache.pekko.actor.testkit.typed.internal.TestProbeImpl.expectMessageType(TestProbeImpl.scala:229)
[JVM-3]   at sample.distributeddata.ReplicatedMetricsSpec.$anonfun$new$6(ReplicatedMetricsSpec.scala:72)
[JVM-3]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[JVM-3]   at org.apache.pekko.testkit.TestKitBase.within(TestKit.scala:429)
[JVM-3]   at org.apache.pekko.testkit.TestKitBase.within$(TestKit.scala:416)
[JVM-3]   at org.apache.pekko.testkit.TestKit.within(TestKit.scala:984)
[JVM-3]   at org.apache.pekko.testkit.TestKitBase.within(TestKit.scala:444)
[JVM-3]   at org.apache.pekko.testkit.TestKitBase.within$(TestKit.scala:444)
[JVM-3]   ...

)))