Dirk reports: Esteban tested something on mFLES and got "Worker protocol violation: connection heartbeat expired" errors. After that the worker disconnects and connects again. He says he is doing something with the GBT links but this looks to me like something else:
/home/flesctl/run/2323/slurm.out
Background: The worker (in this case probably the tsclient) gives this message when it is idle and has not received a heartbeat request from the distributor (flesnet) for 2 seconds. Then it closes the connection and connects again, because it assumes that flesnet has been restarted. However, the distributor sends a heartbeat request every 0.5 s to the workers that are currently idle. Actually, if no timeslices are being built, there should still be the heartbeat messages.
Maybe the interface was not tested with interruptions in the timeslice data stream? The workers connect to the shared memory only when they get the first timeslice, not immediately at "login". So, it may well be that the scenario was not covered by the tests.
Dirk reports: Esteban tested something on mFLES and got "Worker protocol violation: connection heartbeat expired" errors. After that the worker disconnects and connects again. He says he is doing something with the GBT links but this looks to me like something else:
/home/flesctl/run/2323/slurm.out
Background: The worker (in this case probably the tsclient) gives this message when it is idle and has not received a heartbeat request from the distributor (flesnet) for 2 seconds. Then it closes the connection and connects again, because it assumes that flesnet has been restarted. However, the distributor sends a heartbeat request every 0.5 s to the workers that are currently idle. Actually, if no timeslices are being built, there should still be the heartbeat messages.
Maybe the interface was not tested with interruptions in the timeslice data stream? The workers connect to the shared memory only when they get the first timeslice, not immediately at "login". So, it may well be that the scenario was not covered by the tests.