cmbi / mrs

Maarten's Retrieval Service
Boost Software License 1.0
0 stars 3 forks source link

SIGHUP being received regularly causing service restart #19

Open jonblack opened 10 years ago

jonblack commented 10 years ago

I noticed the following whilst looking through the mrs error logs:

[16/Jun/2014:21:05:03 UTC] listening at 0.0.0.0:18090
[17/Jun/2014:06:51:01 UTC] RunMainLoop recieved signal: SIGHUP
[17/Jun/2014:06:51:01 UTC] Restarting services... done
[17/Jun/2014:06:51:03 UTC] listening at 0.0.0.0:18090
[17/Jun/2014:21:00:39 UTC] RunMainLoop recieved signal: SIGHUP
[17/Jun/2014:21:00:40 UTC] Restarting services... done
[17/Jun/2014:21:00:51 UTC] listening at 0.0.0.0:18090
[18/Jun/2014:21:59:22 UTC] RunMainLoop recieved signal: SIGHUP
[18/Jun/2014:21:59:23 UTC] Restarting services... done
[18/Jun/2014:21:59:32 UTC] listening at 0.0.0.0:18090
[19/Jun/2014:21:16:15 UTC] RunMainLoop recieved signal: SIGHUP
[19/Jun/2014:21:16:16 UTC] Restarting services... done
[19/Jun/2014:21:16:24 UTC] listening at 0.0.0.0:18090

This doesn't seem like normal behaviour to me. The service should continually run. We should investigate why this is happening.

cbaakman commented 10 years ago

error.log:

[24/Jun/2014:21:02:30 UTC] RunMainLoop recieved signal: SIGHUP
[24/Jun/2014:21:02:30 UTC] Restarting services... done
[24/Jun/2014:21:02:40 UTC] listening at 0.0.0.0:18090
[25/Jun/2014:06:27:49 UTC] RunMainLoop recieved signal: SIGHUP
[25/Jun/2014:06:27:55 UTC] Restarting services...

access.log:

131.174.161.11 - - [18/Jun/2014:21:57:38 UTC] "POST /mrsws/search HTTP/1.0" 200 18281 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [18/Jun/2014:21:58:02 UTC] "POST /mrsws/search HTTP/1.0" 200 106101 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [18/Jun/2014:21:58:31 UTC] "POST /mrsws/search HTTP/1.0" 200 25960 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [18/Jun/2014:21:58:59 UTC] "POST /mrsws/search HTTP/1.0" 200 8545 "-" "mkhssp" "GetLinkedEx"

...
77.175.108.62 - - [19/Jun/2014:20:59:35 UTC] "GET /entry?db=embl&nr=156220001&q=duchenne%20so%3ahuman HTTP/1.0" 200 7433 "http://mrs.cmbi.ru.nl/m6/search?db=all&q=duchenne+so%3Ahuman&count=3" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" -
77.175.108.62 - - [19/Jun/2014:21:24:42 UTC] "GET /search?db=all&q=duchenne+dystrophin&count=3 HTTP/1.0" 200 29270 "http://mrs.cmbi.ru.nl/m6/search?db=all&q=duchenne+so%3Ahuman&count=3" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" -
77.175.108.62 - - [19/Jun/2014:21:24:57 UTC] "GET /entry?db=genbank&nr=160701110&q=duchenne%20dystrophin HTTP/1.0" 200 7716 "http://mrs.cmbi.ru.nl/m6/search?db=all&q=duchenne+dystrophin&count=3" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" 
...
131.174.161.11 - - [24/Jun/2014:20:59:45 UTC] "POST /mrsws/search HTTP/1.0" 20
0 8281 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [24/Jun/2014:21:00:15 UTC] "POST /mrsws/search HTTP/1.0" 200 106042 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [24/Jun/2014:21:01:41 UTC] "POST /mrsws/search HTTP/1.0" 200 106473 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [24/Jun/2014:21:03:32 UTC] "POST /mrsws/search HTTP/1.0" 200 105940 "-" "mkhssp" "GetLinkedEx"
...
131.174.161.11 - - [25/Jun/2014:06:26:25 UTC] "POST /mrsws/search HTTP/1.0" 200 13745 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [25/Jun/2014:06:27:02 UTC] "POST /mrsws/search HTTP/1.0" 200 17207 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [25/Jun/2014:06:27:05 UTC] "POST /mrsws/search HTTP/1.0" 200 105846 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [25/Jun/2014:06:27:28 UTC] "POST /mrsws/search HTTP/1.0" 200 37259 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [25/Jun/2014:06:27:31 UTC] "POST /mrsws/search HTTP/1.0" 200 107768 "-" "mkhssp" "GetLinkedEx"
131.174.161.11 - - [25/Jun/2014:06:27:38 UTC] "POST /mrsws/search HTTP/1.0" 200 33005 "-" "mkhssp" "GetLinkedEx"
131.174.244.73 - - [25/Jun/2014:09:08:12 UTC] "GET / HTTP/1.0" 200 6870 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ch

There doesn't seem to be any connection between the SIGHUP events and the http requests. SIGHUP is sometimes recieved during a webservice request from mkhssp. This application runs almost continuously on cmbi4, but never gave trouble.

jonblack commented 10 years ago

It could be coming from another process on the same machine, or even a person sending the signal manually so that another process gets more processor time. In any case, I don't think this is normal behaviour.

cbaakman commented 10 years ago

A large databank update was running, might have pushed the machine to its limits.

I don't think other people are active on the MRS machine, and if they were, wouldn't they need root permissions to send m6 a SIGHUP signal?

cbaakman commented 9 years ago

It happened again, after an update MRS tried to restart and became nonresponsive.

This time, I found an error message in the error log:

[10/Dec/2014:22:53:30 UTC] RunMainLoop recieved signal: SIGHUP
*** glibc detected *** m6: corrupted double-linked list: 0x0000000006965490 ***