ITISFoundation / osparc-ops-environments

osparc operations
MIT License
3 stars 6 forks source link

Dockerhub rate limits issue when using Private EC2s #716

Open sanderegg opened 1 month ago

sanderegg commented 1 month ago

Since AWS-master and Tip.science are using private EC2 machines the following issue is arising:

Current possible options:

some references:

### Tasks
- [ ] master configuration
- [ ] staging configuration
- [ ] prod configuration
YuryHrytsuk commented 1 month ago

Literature

YuryHrytsuk commented 1 month ago

Docker log in per request vs Private registry as pull through cache to Dockerhub

Docker log in per request - need to be done per request (every time) - need to log in on every machine

Private registry as pull through cache to Dockerhub + once set up works automatically out-of-the-box with no overhead - OPS need to guarantee private registry uptime - OPS need to properly loadbalance and scale private registry - OPS need to maintain disk space of private registries - once registry is down no image can be pulled via regsitry - daemon.json needs to be configured on every machine

Decision Based on this comparison, I decide to go on with Docker log in per request approach. Even though I would rather have something working automatically out-of-the-box but the cost is too high

@mrnicegyu11, @sanderegg, let me know what are your thoughts on this matter

YuryHrytsuk commented 1 month ago

Docker swarm and authenticated image pulling. Questions

Does it matter which user "runs" docker container [linux] process? (unrelated to docker swarm)

Do I need to issue docker login on every docker swarm node?

Do I need to issue docker login every time I ssh into machine?

Do I need to issue docker login on the machine or docker swarm already have login information (in case it was run before with --with-registry-auth)?

Solution. The way to go

  1. Make sure docker is authenticated for the user that runs the commands mentioned in the step below
  2. Execute all docker service ... and docker stack ... commands with --with-registry-auth option
  3. Have a test monitoring remaining pull requests for anonymous user and report in case it gets below 100 (the max number)

Docker Swarm Image Pulling experiments

Swarm Cluster:

Case 1

Setup:

Outcome:

Case 2

Setup:

Outcome:

Case 3

Setup:

Outcome:

Case 4

Setup:

Outcome:

Case 4

Setup:

Outcome:

Case 5. All machines are logged in. No --with-registry-auth option

Setup:

Outcome:

Case 6. All machines are [docker] logged out. With --with-registry-auth option. The image is present on all machines

Setup:

Outcome:

sanderegg commented 1 month ago

Docker log in per request vs Private registry as pull through cache to Dockerhub

Docker log in per request - need to be done per request (every time) - need to log in on every machine

Private registry as pull through cache to Dockerhub + once set up works automatically out-of-the-box with no overhead - OPS need to guarantee private registry uptime - OPS need to properly loadbalance and scale private registry - OPS need to maintain disk space of private registries - once registry is down no image can be pulled via regsitry - daemon.json needs to be configured on every machine

Decision Based on this comparison, I decide to go on with Docker log in per request approach. Even though I would rather have something working automatically out-of-the-box but the cost is too high

@mrnicegyu11, @sanderegg, let me know what are your thoughts on this matter

I would like to discuss these +/-. If the private registry is down that means anyway:

YuryHrytsuk commented 1 month ago

Docker log in per request vs Private registry as pull through cache to Dockerhub Docker log in per request - need to be done per request (every time) - need to log in on every machine Private registry as pull through cache to Dockerhub + once set up works automatically out-of-the-box with no overhead - OPS need to guarantee private registry uptime - OPS need to properly loadbalance and scale private registry - OPS need to maintain disk space of private registries - once registry is down no image can be pulled via regsitry - daemon.json needs to be configured on every machine Decision Based on this comparison, I decide to go on with Docker log in per request approach. Even though I would rather have something working automatically out-of-the-box but the cost is too high @mrnicegyu11, @sanderegg, let me know what are your thoughts on this matter

I would like to discuss these +/-. If the private registry is down that means anyway:

* nothing can run as all the services are only available there (s4l, isolve, etc...)

* OPS side, if the registry is down, then oSparc is anyway able to run nothing, so not sure this changes much,

* Costs side, it is probably an improvement, especially if the registry is setup with S3 VPC, and scaled on the regions we are using,

* number of pulls toward Dockerhub is down to a minimum

@mrnicegyu11, @sanderegg and I had a discussion and decided to try pull through registry approach. I will do a POC to check how / if it actually works

YuryHrytsuk commented 1 month ago

Docker registry pull through cache

Useful links:

Insights:

Conclusions

I am in favor of docker login + --with-registry-auth approach at this point of time. Once all our machines are private, we can consider it one more time.

UPD:

YuryHrytsuk commented 4 weeks ago

Docker registry as pull through cache experiments

Context:

Outcomes:

YuryHrytsuk commented 3 weeks ago

Blocked for AWS Deployments until OPS nodes are private https://github.com/ITISFoundation/osparc-ops-environments/issues/574