azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Fix docker login command in 'cipublish' #823

Closed KlaasH closed 3 years ago

KlaasH commented 3 years ago

When I ran cipublish in the VM in the process of doing a production deploy, it failed with

unknown shorthand flag: 'e' in -e See 'docker login --help'.

That's coming from the docker login command produced by aws ecr get-login, which authenticates with ECR to be able to push containers.

Seemingly the AWS CLI is producing an argument that Docker doesn't recognize. Given that Docker is fairly up to date (v18) and the AWS CLI is pretty old (it's using the default for the Ansible role, 1.11.14), it seems like the problem is likely on the AWS CLI side. But when I upgrade it and run aws ecr get-login again in the container, it still has the -e none option.

So it might make sense to upgrade the AWS CLI to version 2. I don't know if there are commands that would be affected by breaking changes between the major versions. My guess would be no, but I didn't go looking.

UPDATE: I started working on this in conjunction with issue #827, but bailed because it wasn't turning out to be a "while I'm here..." situation. There's no Python package for the AWS CLI v2. The only instructions they give for installing it are "download the zip, unzip in, run the install script". I found a 3rd-party Ansible role which seems decent, except it doesn't let you pin the version, just always installs latest. So that's not ideal. But we could make our own role by cribbing the steps from that but changing the URL to pull a variable-specified version.

It's also possible that it's not worth the trouble to upgrade to v2 right now and we should just look for the path of least resistance to getting v1 working again, like possibly munging the output of aws ecr get-login to remove the deprecated flag.

colekettler commented 3 years ago

@KlaasH We've been looking at this a little more closely in azavea/operations#601. I think I see what the root of your original problem was: the -e flag, which is included by default, was indeed removed after Docker 17.06. We've addressed this in other projects like OAR by specifying the --no-include-email flag and continuing to use the v1 CLI.

We unfortunately don't have any immediate plans or capacity to update to v2 of the AWS CLI on our Jenkins instances. That would require a Python version bump which we'd prefer to tie to an Ubuntu version upgrade, adding extra load onto the effort. An upgrade to a v1 CLI version that supports the get-login-password command would also require that upgrade.

I don't want to disrupt your development or deployment, but would reverting the v2 upgrade and using the --no-include-email flag would be a viable option? We recognize the need to eventually migrate to the v2 CLI, but in the interests of regularity and boring configs, sticking with the old version may be the path of overall least resistance for now.

KlaasH commented 3 years ago

Alas that I didn't see the --no-include-email flag before. Yeah, we can revert the CLI upgrade and use that option instead.

colekettler commented 3 years ago

Fabulous! We really appreciate it, thanks so much for helping out here.