MatthiasScholz / cos

Basic Cluster Orchestration Setup
GNU Lesser General Public License v3.0
34 stars 11 forks source link

In nw-separation setup nomad fails to download the images #6

Closed ThomasObenaus closed 6 years ago

ThomasObenaus commented 6 years ago

04/14/18 08:58:48 UTC Restarting Task restarting in 30.185113191s 04/14/18 08:58:48 UTC Driver Failure failed to initialize task "ping_service_task" for alloc "b9f34abf-f20c-27ff-6341-5c1040f9476f": Failed to pull 307557990628.dkr.ecr.us-east-1.amazonaws.com/service/ping-service:0.0.7: API error (500): {"message":"Get https://307557990628.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}

ThomasObenaus commented 6 years ago

Root-Cause:

We want to restrict access of the nomad-masters (leader) to the internet. That's why they are inside a subnet that has only access to AWS services. This restriction is made by allowing only routes to AWS services a specified at: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Problem 1 - binaries/ images from non ECR sources.

The fabio binary is loaded directly from github. But there is no route that allows egress access to GH.

Problem 2 - access to ECR needs a lot of the ip's specified at https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Which results in more than 50 route-entries for a route-table. And the limit for route-tables is 50. Of course a limit increase can be requested, but due to potential performance impact it's not recommended to do so.

ThomasObenaus commented 6 years ago

To solve problem 1:

Short term solution

Widen the cidr-blocks to /8.

Long term solution

Create script that uses /16 and merge the cidrs accordingly.

ThomasObenaus commented 6 years ago

To solve problem 2:

Short term solution

Add "192.30.253.0/24" for github access

Long term solution

Download binaries only from internal locations (i.e. artifactory) and grant access to this location.

ThomasObenaus commented 6 years ago

Problem 1 solved using the short term solution. Ticket https://github.com/MatthiasScholz/cos/issues/7 for long-term solution created.

ThomasObenaus commented 6 years ago

Problem 2 solved using the short term solution. Ticket https://github.com/MatthiasScholz/cos/issues/8 for long-term solution created.

--> Close bug