SCPR / grand-central

A content syndicator as a micro-service.
0 stars 0 forks source link

Figure out why tasks have begun to fail #13

Closed Ravenstine closed 7 years ago

Ravenstine commented 7 years ago

AC:

Ravenstine commented 7 years ago

Some notes about what's happened so far:

It seems more likely that something went wrong at the host level rather than the container/code level. If the database storage volume had filled up then CouchDB wouldn't have been able to continue working normally. Because I can't SSH, I can't see if there are any system logs to tell me what happened. I'm going to close this ticket if I can't find a cause by the end of Friday, and then look into moving from ECS to Rancher.

Ravenstine commented 7 years ago

Haven't nailed anything down, although SSH is back for some mysterious reason. For the time being, I've added a CloudWatch alarm to the EC2 instance so we can be alerted when the CPU usage falls below a set threshold.