datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

Monitor and Restore server after Redis fails #117

Closed AcckiyGerman closed 6 years ago

AcckiyGerman commented 6 years ago

After pushing huge amounts of data, or pushing several datasets simultaneously - the Redis server could hang and all data processing are impossible. Last time we notice that after 3 hours after the accident. (see https://github.com/datahq/datahub-qa/issues/79 )

Acceptance criteria

Tasks

Tests:

Analysis

zelima commented 6 years ago

We got rid of Redis service at all and integrated data factory into the flowmanager

AcckiyGerman commented 6 years ago

@zelima do we have some kind of monitoring for another system parts? How do we know that, say, we're out of memory?

zelima commented 6 years ago

@AcckiyGerman we used to have datadog, but now terminated as out of trial period

zelima commented 6 years ago

@AcckiyGerman tip: tick the appropriate checkboxes in the issue description when closing, or add a comment if not ticked

AcckiyGerman commented 6 years ago

@zelima but it was you, who closed the issue ;) and, by the way, Acceptance criteria is not reached.

zelima commented 6 years ago

@AcckiyGerman correct - tip for myself :) Updated description appropriately

AcckiyGerman commented 6 years ago

@zelima I'd like to reopen this issue (and rename to "Monitor and Restore datahub.io server Modules (web, pipeline, specstore, etc)"

Coz as I see we still don't know either the server is running OK or NOT until somebody tries to get some page and fail.

AcckiyGerman commented 6 years ago

FIXED: This issue was about Redis problems. Open issue about general server problems here: https://github.com/datahq/pm/issues/122