ICOS-Carbon-Portal / stiltweb

Web facade for STILT modelling tool
1 stars 0 forks source link

Communication between stiltweb and stiltcluster breaks up #25

Open mirzov opened 1 year ago

mirzov commented 1 year ago

The user experience of the problem is that a newly submitted job does not make progress, and no further jobs can be added. Relevant log entry of the stiltweb service:

Apr 10 23:54:58 fsicos2.lunarc.lu.se java[3672449]: [WARN] [04/10/2023 23:54:58.743] [StiltBoss-akka.remote.default-remote-dispatcher-6] [akka.stream.Log(akka://StiltBoss/system/Materializers/StreamSupervisor-1)] [outbound connection to [akka://WorkMaster@icos1.wg-fsicos2:2561], control stream] Upstream failed, cause: StreamTcpException: The connection has been aborted

And a relevant log entry of the stiltcluster service:

Apr 11 10:56:56 icos1.gis.lu.se java[2315137]: [WARN] [04/11/2023 10:56:56.011] [WorkMaster-akka.remote.default-remote-dispatcher-5] [akka.stream.Log(akka://WorkMaster/system/Materializers/StreamSupervisor-1)] [outbound connection to [akka://StiltBoss@fsicos2.wg-fsicos2:2550], message stream] Upstream failed, cause: StreamTcpException: The connection has been aborted

It seems likely that the reason for connection interruption is idle timeout. In this case adding a periodic "keepalive" message exchange between WorkMaster and WorkReceptionist should solve the problem.

mirzov commented 7 months ago

Another attempt to resolve the issue: https://github.com/ICOS-Carbon-Portal/stiltweb/commit/56f935e8ed94012af4cc82392f0239392dd5a59a