devsisters / shardcake

Sharding and location transparency for Scala
https://devsisters.github.io/shardcake/
Apache License 2.0
389 stars 30 forks source link

Upgrade zio, catch and log errors in Sharding.unregister #74

Closed mleclercq closed 1 year ago

mleclercq commented 1 year ago

As I was running a shardcake app locally, I noticed that sometime when I stoped a pod app, it failed to properly unregister itself to the shard manager. I added some catchAllCause in Sharding.unregister and got errors like this one:

Error during stop of entity cause=Exception in thread "zio-fiber-1" java.lang.InterruptedException: Interrupted by thread "zio-fiber-1"

This really seems to be a race condition as I was getting this error maybe once every 4 time or so.

Looks like we are getting interrupted during a finalization which should not happen. Looking at ZIO release notes I noticed this bug fixed in 2.0.14: https://github.com/zio/zio/pull/8086. This seems relevant because terminateAllEntities contains a timeout.

I upgraded to 2.0.15 and it seemed to fix the issue.

In this PR, I upgraded to ZIO 2.0.15 and added some "catch and log" in Sharding.unregister to make sure the pod cleanly unregisters itself even if something else fails before.