The server could hang for different reasons
Once we notice that after 3 hours after the accident. (Redis failded)
Today we notice that server is down, when I can't get any data while writing a blog-post
So we are not controlling either server is running OK or it is DOWN until somebody try to get some data and fail.
Acceptance criteria
[x] there is a monitoring system that check if the server and all related daemons are running well
Specstore
Rawstore
web-server
... what else?
[x] monitoring system notifies administrator if something crashes
[ ] monitoring system tries to restore failed daemons or docker instances or whatever
Partially FIXED. We heave datahub-health repo that monitors and reports with results every 24 hours. Though does not try to recover https://travis-ci.org/datahq/datahub-health
The server could hang for different reasons Once we notice that after 3 hours after the accident. (Redis failded) Today we notice that server is down, when I can't get any data while writing a blog-post
So we are not controlling either server is running OK or it is DOWN until somebody try to get some data and fail.
Acceptance criteria
Tasks
Tests:
data push
HUGE data in several threadsAnalysis