OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
10 stars 9 forks source link

Worker: do something better when the JWT expires #688

Open josephjclark opened 4 months ago

josephjclark commented 4 months ago

We've had some problems lately with the JWT on the run channel expiring , causing messages to fail.

It looks a bit like this:

image

The worker should do better in these cases, throwing a clear error and existing the channel and maybe the socket.

The question is WHERE we report. We can't tell lightning because, well, the JWT expired. Any attempts to send a message back will be rejected.

This comes into monitoring - we don't have a monitoring solution yet, other than lightning and GCP.

Perhaps a good approach is to shut the whole server down (probably gracefully) with a clear error like "expired JWT detected" and stop requesting traffic. That depends a bit whether it's one run that's expired or whether all JWTs are wrong.

We probably don't unit test for any of this very well at the moment. I think unit testing and clear logs to GCP are the first step. That also makes it easier for us to trace where to add more monitoring later.