SeattleTestbed / monitor_script

Monitoring scripts for SeattleTestbed processes and services
MIT License
0 stars 1 forks source link

geoipserver monitoring not functional #11

Open aaaaalbert opened 9 years ago

aaaaalbert commented 9 years ago

geoipserver.poly.edu hung for a few days, and the monitoring scripts did not report that, although the monitoring script processes on blackbox.poly.edu do run. From the script's log:

Fri Dec  5 12:46:38 2014 : notifying jcappos@poly.edu
Fri Dec  5 12:46:38 2014 : notifying albert.rafetseder@univie.ac.at
Fri Dec  5 12:46:38 2014 : notifying monzum@u.washington.edu
Fri Dec  5 12:46:38 2014 : notifying leon.wlaw@gmail.com
Fri Dec  5 12:46:38 2014 : notifying hermanchchen@gmail.com
Fri Dec  5 12:46:38 2014 : notifying ak4282@students.poly.edu
a mail was sent

No mail was sent.

asm582 commented 9 years ago

The monitor process that was worked on for email had only seattle and seattleclearinghouse services checked which is as below:-

https://github.com/asm582/monitor_script/blob/master/monitor_processes.py

May be i am incorrect but is there some other script for geoipserver...

aaaaalbert commented 9 years ago

(Note: I found this when discovering SeattleTestbed/geoip_server#6).

aaaaalbert commented 9 years ago

The log snippet is from geoip_response.py if that's what you are referring to.

asm582 commented 9 years ago

Found that user name and password for the mail account from where the notifications are relayed has not been set properly hence this causes the mail program to skip execution.Also no exception is raised somehow in this case. I have managed to make this program run by hardcoding values currently...

asm582 commented 9 years ago

Digging more on this with IP address - '128.238.63.15' and '128.238.63.50' i found that script hangs as it expects some return value from geoip_record_by_addr function which internally calls librepysocket.r2py at the lowest level and i guess it does not time out and hence exception is not caught which eventually does not trigger an email notification . I have tested the code with no IP address and in such case it alerts the user via mail regarding monitoring status. we currently monitor "128.238.66.181" but when i do nslookup the domain name does not exist hence changed IP address to the above mentioned IP's for testing.

us341 commented 9 years ago

I am curious to know if you are talking about the recv method of librepysocket. If so we can discuss more in detail.

On Wed, Dec 10, 2014 at 3:51 PM, asm582 notifications@github.com wrote:

Digging more on this with IP address - '128.238.63.15' and '128.238.63.50' i found that script hangs as it expects some return value from geoip_record_by_addr function which internally calls librepysocket.r2py at the lowest level and i guess it does not time out and hence exception is not caught which eventually does not trigger an email notification . I have tested the code with no IP address and in such case it alerts the user via mail regarding monitoring status. we currently monitor "128.238.66.181" but when i do nslookup the domain name does not exist hence changed IP address to the above mentioned IP's for testing.

— Reply to this email directly or view it on GitHub https://github.com/SeattleTestbed/monitor_script/issues/11#issuecomment-66521757 .

asm582 commented 9 years ago

I have used openconnection to poll servers 128.238.63.50 and 128.238.63.15. server 128.238.63.50 intermittently fails to provide response.

The fixed geoip_response.py file is as below:-

https://github.com/asm582/seattleissues/blob/master/geoip_response.py

and the polling logic is implemented as below:-

https://github.com/asm582/seattleissues/blob/master/poll_geoserver.r2py

Please let me know if the above fix works.