We should keep track in the db (a persistent durable place to keep state) if a rule is currently in flight and for how long it has been. This should be safer than file locks to protect against concurrent executions of the same job, and by keep track of when the job started, we can use heuristics to say something like "this rule was started >12 hours ago, using this pid... we probably don't need to wait for it anymore, so let's kill that pid if it's still around and move on"
We should keep track in the db (a persistent durable place to keep state) if a rule is currently in flight and for how long it has been. This should be safer than file locks to protect against concurrent executions of the same job, and by keep track of when the job started, we can use heuristics to say something like "this rule was started >12 hours ago, using this pid... we probably don't need to wait for it anymore, so let's kill that pid if it's still around and move on"