Closed AcckiyGerman closed 6 years ago
We got rid of Redis service at all and integrated data factory into the flowmanager
@zelima do we have some kind of monitoring for another system parts? How do we know that, say, we're out of memory?
@AcckiyGerman we used to have datadog, but now terminated as out of trial period
@AcckiyGerman tip: tick the appropriate checkboxes in the issue description when closing, or add a comment if not ticked
@zelima but it was you, who closed the issue ;) and, by the way, Acceptance criteria is not reached.
@AcckiyGerman correct - tip for myself :) Updated description appropriately
@zelima I'd like to reopen this issue (and rename to "Monitor and Restore datahub.io server Modules (web, pipeline, specstore, etc)"
Coz as I see we still don't know either the server is running OK or NOT until somebody tries to get some page and fail.
FIXED: This issue was about Redis problems. Open issue about general server problems here: https://github.com/datahq/pm/issues/122
After pushing huge amounts of data, or pushing several datasets simultaneously - the Redis server could hang and all data processing are impossible. Last time we notice that after 3 hours after the accident. (see https://github.com/datahq/datahub-qa/issues/79 )
Acceptance criteria
there is a monitoring system that check if the server and all related daemons are running wellmonitoring system notifies administrator if something crashes(optional) monitoring system restore failed daemons or docker instances or whateverTasks
chose the proper monitoring systemimplement monitoringTests:
data push
HUGE data in several threadsAnalysis