ganglia / monitor-core

Ganglia Monitoring core
BSD 3-Clause "New" or "Revised" License
490 stars 246 forks source link

gmetad poll() timeout #46

Open apancom opened 12 years ago

apancom commented 12 years ago

We setup a cluster in Amazon EC2.

Sometimes, gmetad faild to receive data from gmond master in 15 minutes. The error log is: Aug 15 08:30:08 /usr/sbin/gmetad[3253]: poll() timeout from source 0 for [RS EU] data source after 2531 bytes read Aug 15 08:30:22 /usr/sbin/gmetad[3253]: poll() timeout from source 0 for [RS EU] data source after 0 bytes read Aug 15 08:30:37 /usr/sbin/gmetad[3253]: poll() timeout from source 0 for [RS EU] data source after 0 bytes read Aug 15 08:30:51 /usr/sbin/gmetad[3253]: poll() timeout from source 0 for [RS EU] data source after 0 bytes read ...

Please let me know if there is more info I need provide. Thanks a lot.

vvuksan commented 12 years ago

This would indicate that you have connectivity issues to gmond hosting RS EU. Only way to prove it is for you to setup a process that connects to RS EU master every 15 seconds to download the XML and see if you see problems at the same time.

I have seen EC2 network flake out on occasion.

apancom commented 12 years ago

Thanks vvuksan. It's better to set 'timeout' of tcp accept channel ? I saw the default setting is 'no timeout'.

cburroughs commented 12 years ago

Would the issue of a timeout (or not having one) be related to #47 ?