As I was running a shardcake app locally, I noticed that sometime when I stoped a pod app, it failed to properly unregister itself to the shard manager. I added some catchAllCause in Sharding.unregister and got errors like this one:
Error during stop of entity cause=Exception in thread "zio-fiber-1" java.lang.InterruptedException: Interrupted by thread "zio-fiber-1"
This really seems to be a race condition as I was getting this error maybe once every 4 time or so.
Looks like we are getting interrupted during a finalization which should not happen. Looking at ZIO release notes I noticed this bug fixed in 2.0.14: https://github.com/zio/zio/pull/8086. This seems relevant because terminateAllEntities contains a timeout.
I upgraded to 2.0.15 and it seemed to fix the issue.
In this PR, I upgraded to ZIO 2.0.15 and added some "catch and log" in Sharding.unregister to make sure the pod cleanly unregisters itself even if something else fails before.
As I was running a shardcake app locally, I noticed that sometime when I stoped a pod app, it failed to properly unregister itself to the shard manager. I added some
catchAllCause
inSharding.unregister
and got errors like this one:This really seems to be a race condition as I was getting this error maybe once every 4 time or so.
Looks like we are getting interrupted during a finalization which should not happen. Looking at ZIO release notes I noticed this bug fixed in 2.0.14: https://github.com/zio/zio/pull/8086. This seems relevant because
terminateAllEntities
contains a timeout.I upgraded to 2.0.15 and it seemed to fix the issue.
In this PR, I upgraded to ZIO 2.0.15 and added some "catch and log" in
Sharding.unregister
to make sure the pod cleanly unregisters itself even if something else fails before.