Have a weird condition and just want to know if this is expected or if I need to do something else to handle it.
I start 10 workers on server boot which is great, some of my jobs can and do get stuck and I have conditions to check this.
I send the SIGUSR1 command to the worker to tell it to stop processing the current job as it is stuck and to take a new job on.
I have noticed that when I check for php-resque processes AFTER a few workers have gotten stuck and as mentioned above I have taken care of it, I have a new pid for a new worker aswell as the original.
As you can see below, for every stuck job I unstuck it creates a new process... my jobs are a little sensitive in that if I have too many running at once the whole system snowballs and falls over.
Are these just breadcrumbs, or am I better to just manually kill the process by it's PID instead of sending a stop current job and create another.
3232 resque-1.2: Waiting for *
3233 resque-1.2: Waiting for *
3234 resque-1.2: Waiting for *
3238 resque-1.2: Waiting for *
3241 resque-1.2: Forked 9661 at 2015-05-20 14:57:54
3245 resque-1.2: Waiting for *
3250 resque-1.2: Forked 12915 at 2015-05-20 14:59:14
3251 resque-1.2: Waiting for *
3254 resque-1.2: Waiting for *
3260 resque-1.2: Forked 12706 at 2015-05-20 14:59:13
9661 resque-1.2: Processing default since 2015-05-20 14:57:54
12706 resque-1.2: Processing default since 2015-05-20 14:59:13
12915 resque-1.2: Processing default since 2015-05-20 14:59:14
My expected result of sending the SIGUSR1, is that it doesn't create any new worker processes and as the command suggests all it does is tell it to cease work on the current job and take another if available.
EDIT
I think maybe I am sending the SIGUSR1 to the child pid... little confused, I store the pid the job is using and therefore the worker for the specific job via posix_getpid in the setUp()
I then store that pid against the job in my database and if I detect the job is stuck, then I send the SIGUSR1 signal to the PID which is saved.
I think I am sending the command to the child PID - if so; what can I use in my setUp() to get the PID of the parent process so when it gets stuck I can action the signal as intended.
EDIT #2.
A user on here helped me before with a slightly different issue, he told me to use the posix_getppid</code< command... I assumed the ppid was a typo...but looking into it, it gets the parent PID... which means the signal will work.
Hey Guys,
Have a weird condition and just want to know if this is expected or if I need to do something else to handle it.
I start 10 workers on server boot which is great, some of my jobs can and do get stuck and I have conditions to check this.
I send the SIGUSR1 command to the worker to tell it to stop processing the current job as it is stuck and to take a new job on.
I have noticed that when I check for php-resque processes AFTER a few workers have gotten stuck and as mentioned above I have taken care of it, I have a new pid for a new worker aswell as the original.
As you can see below, for every stuck job I unstuck it creates a new process... my jobs are a little sensitive in that if I have too many running at once the whole system snowballs and falls over.
Are these just breadcrumbs, or am I better to just manually kill the process by it's PID instead of sending a stop current job and create another.
3232 resque-1.2: Waiting for *
3233 resque-1.2: Waiting for *
3234 resque-1.2: Waiting for *
3238 resque-1.2: Waiting for *
3241 resque-1.2: Forked 9661 at 2015-05-20 14:57:54
3245 resque-1.2: Waiting for *
3250 resque-1.2: Forked 12915 at 2015-05-20 14:59:14
3251 resque-1.2: Waiting for *
3254 resque-1.2: Waiting for *
3260 resque-1.2: Forked 12706 at 2015-05-20 14:59:13
9661 resque-1.2: Processing default since 2015-05-20 14:57:54
12706 resque-1.2: Processing default since 2015-05-20 14:59:13
12915 resque-1.2: Processing default since 2015-05-20 14:59:14
My expected result of sending the SIGUSR1, is that it doesn't create any new worker processes and as the command suggests all it does is tell it to cease work on the current job and take another if available.
EDIT I think maybe I am sending the SIGUSR1 to the child pid... little confused, I store the pid the job is using and therefore the worker for the specific job via
posix_getpid
in thesetUp()
I then store that pid against the job in my database and if I detect the job is stuck, then I send the SIGUSR1 signal to the PID which is saved.I think I am sending the command to the child PID - if so; what can I use in my
setUp()
to get the PID of the parent process so when it gets stuck I can action the signal as intended.EDIT #2. A user on here helped me before with a slightly different issue, he told me to use the
posix_getppid</code< command... I assumed the ppid was a typo...but looking into it, it gets the parent PID... which means the signal will work.
Hope this helps someone else
Thanks heaps!