connected-web / product-monitor

A HTML/JavaScript template for monitoring a product by encouraging product developers to gather all the information about the status of a product, including live monitoring, statistics, endpoints, and test results into one place.
8 stars 3 forks source link

statusOf Timeout is too agressive #26

Closed subsidel closed 9 years ago

subsidel commented 9 years ago

Often reporting sites as being down if network isn't always a reliable high speed.

johnbeech commented 9 years ago

What do you suggest as the fix? I reduced it to 5 seconds to prevent the server getting clogged up with bad DNS host requests. I could monitor the past 5 responses and base the reported state off that?

subsidel commented 9 years ago

It's currently at 1/4 of a second (250ms) Are you saying it should be at 5 seconds (5000ms)?

One possible solutions could be to add a server config variable that stands for all network requests made by the server, or as you suggest, monitor the past few responses and increase or decrease the timeout based on those, and how the network seems to be behaving in general
Assumption: if everything is timing out, it suggests a higher possibility of local network being an issue

johnbeech commented 9 years ago

I think I should return a "certainty" measure, just generally more data would be useful to grade between "Definitely Good / OK", "Definitely Bad / Not OK", "Not Sure / Not OK".

johnbeech commented 9 years ago

Sorry, I didn't check it is 250ms as you said, I think I tried 5000ms as the first attempt, but it still failed. The issue is responding fast to the client, rather then necessarily the number of connections held open. I don't know how many connections Express can have to the wider world before it begins to block new requests coming in, but the client browser visibly locks up - which is undesirable.

subsidel commented 9 years ago

Perhaps a better fix is to have less same domain request all happening at once?

Markavian commented 9 years ago

Reduced the timeout to 900ms from 250ms. Please let me know if that improves or worsens things; considering the new icons, colours, and messages

Markavian commented 9 years ago

Closing this now, but I understand improvements can be made to the statusOf endpoint to make it more resilient to network blips.