Open brightball opened 10 years ago
At first glance, looks like an issue with using trap context in a way that is deadlockable(see this). In theory flow workers are supposed to have a graceful shutdown mechanic, but I can see how it would be problematic in 2.0 if they have deadlockable detection on traps. Can you confirm that you are getting this error while running ruby >= 2.0 ?
It is running 2.1.0 actually.
Still working on this, sorry for the slow progress.
FWIW, the flow framework does rescue Exception
everywhere which blocks signals regardless of the deadlock issue you describe. You should consider replacing all of these with less aggressive rescues.
https://github.com/aws/aws-flow-ruby/blob/master/aws-flow/lib/aws/decider/task_poller.rb#L259 is a big culprit.
This is something we flagged pretty early on, but we unfortunately can't change the behavior without doing a major version bump. We definitely are planning to replace all the rescue Exception
with something less aggressive like rescue StandardError
. This may take a bit of time, as we will want to package together reverse-incompatible changes. I'll get together a list of the changes we are planning on making that may be reverse-incompatible and add them to the repository.
IMO, rescue Exception
is a bug and not a backwards-incompatible change. There should not be code depending on this behavior.
Any ETA on this being resolved?
Changing the title to better reflect the current issue.
1) The original issue was that ruby 2.0 doesn't allow any locking (rightly so) in trap contexts. While we don't explicitly acquire any locks, we do initiate shutdown of workers when we receive signals. This forces Flow to log some stuff. Logger internally tries to lock on a mutex and it fails.
2) Workers do indeed shutdown when they receive signals just not in the most graceful way. i.e. they don't handle the long polling behavior of SWF well. Refer to #31 to get more context.
3) We need to rescue StandardError instead of Exception.
@kinsersh can you tell me what issue you were referring to?
I was referring to number 3 in your list.
I'm trying to use the Flow framework with Foreman, which just sends a SIGINT to the running process.
I'd like to be able to do something like this:
Right now, whenever I press CTRL+C with Foreman it's sending a SIGINT which results in this error: