fecgov / fecfile-web-app

7 stars 4 forks source link

Investigate failing e2e tests with Docker engine version 26, API version 1.45 #1941

Closed lbeaufort closed 1 day ago

lbeaufort commented 3 months ago

Business Reason As a developer, I will be able to run end-to-end tests with the latest docker engine versions so that I can keep the underlying technology up to date.

QA Notes

See successful E2E tests: [https://app.circleci.com/pipelines/github/fecgov/fecfile-web-app/6199/workflows/aecaf364-6ea0-41d7-aee0-163f26a0ae5c/jobs/21335|https://app.circleci.com/pipelines/github/fecgov/fecfile-web-app/6199/workflows/aecaf364-6ea0-41d7-aee0-163f26a0ae5c/jobs/21335]

!image-20240808-131844.png|width=1077,height=142,alt="image-20240808-131844.png"! See image in Jira

!image-20240808-131953.png|width=1031,height=603,alt="image-20240808-131953.png"!

DEV Notes

On May 2, the e2e tests passed for the release/sprint-41 branch. When I re-ran them today (5/15), they failed for the same commit: https://app.circleci.com/pipelines/github/fecgov/fecfile-web-app?branch=release%2Fsprint-41. One difference between the failing and passing tests:

On the Setup a remote Docker engine step: Passing

Created container accessible with:
  DOCKER_HOST=unix:///var/run/docker.sock
  DOCKER_MACHINE_NAME=localhost
Server Engine Details:
  Version:          24.0.9
  API version:      1.43 (minimum version 1.12)

Failing

Created container accessible with:
  DOCKER_HOST=unix:///var/run/docker.sock
  DOCKER_MACHINE_NAME=localhost
Server Engine Details:
  Version:          26.0.2
  API version:      1.45 (minimum version 1.24)

Pinning the version to 24 in the Circle config resulted in passing e2e tests. The best guess I have so far is that the Docker API had a breaking change: https://docs.docker.com/engine/api/version-history/#v145-api-changes. Hotfix PRs in progress,

Also, I'd like to understand how we're using docker.io/jwilder/dockerize and if that's a factor.

Originally posted by @lbeaufort in https://github.com/fecgov/fecfile-web-app/issues/1928#issuecomment-2113503624

NOTE: To speed up debugging, try limiting the number of e2e tests to one. Also, check for latest release version of Docker 26

QA Notes

Passes QA if merge into develop runs successfully in nightly run.

Design

null

FECFILE-390

AureliaKhorsand commented 2 months ago

Blocked by #1925

sasha-dresden commented 2 months ago

I'm putting this back to Sprint Backlog. I'm not getting anywhere with this. I'm going to work on some other stuff and come back to this, but if someone else wants to take a crack at it, feel free.

What I've discovered/tried:

I'm starting to suspect it has something to do with cypress and how it interacts with docker in this instance, so maybe there's some configuration there we can change.

Elaine-Krauss-TCG commented 2 months ago

Also, I'd like to understand how we're using docker.io/jwilder/dockerize and if that's a factor.

After doing a little digging and experimentation, we're using dockerize for the -wait and -timeout commands. We use these commands to ensure that the API docker container has fully started before starting e2e testing. You can find the documentation for dockerize here

exalate-issue-sync[bot] commented 4 weeks ago

Elaine Krauss commented: I also have to put this back into Sprint Backlog since I’ve been unable to crack this either.

Some things that I’ve learned:

If there’s one area that I think would be a good opportunity for further investigation, it would be to somehow gain access to and investigate the fecfile-web-api docker container’s NGINX and network logs. -The error that Cypress is showing- -suggests- -that the request is being rejected, and that should show in the logs somewhere.- Edit: I double-checked this, and Cypress gives the same error even when it makes a request to a nonsensical port (e.g, localhost:1001). Still, the point stands that this seems to be a problem of networking, and looking at the docker containers' logs might give us further insight.

!image-20240724-151038.png|width=622,height=423,alt="image-20240724-151038.png"!

!image-20240724-154204.png|width=945,height=749,alt="image-20240724-154204.png"!

exalate-issue-sync[bot] commented 2 weeks ago

David Heitzer commented: mostly passing tests https://app.circleci.com/pipelines/github/fecgov/fecfile-web-app/6150/workflows/1cb3f9e0-7483-4f59-bb5e-8e9077f35a66/jobs/21189

exalate-issue-sync[bot] commented 2 weeks ago

David Heitzer commented: It looks like [+this change+|https://github.com/moby/moby/pull/47062] in docker version 26 is causing IPv6 to be enabled in our containers and the /etc/hosts file to be updated accordingly. According to [+their release notes+|https://docs.docker.com/engine/release-notes/26.0/#bug-fixes-and-enhancements], this can be disabled if the host stack doesn't support it (although it is supposed to happen automatically).

I suspect that something in CircleCI's {{setup_remote_docker}} configuration doesn't support this. I opened a CircleCI issue since this is breaking container-to-container networking using {{--network container:fecfile-api}}. Using the {{net.ipv6.conf.all.disable_ipv6=1}} disables it and gets our tests running again.

{noformat}Thank you for contacting CircleCI Support. Your ticket reference ID is: 153108{noformat}

exalate-issue-sync[bot] commented 1 week ago

Todd Lees commented: Passes CR moving to QA

exalate-issue-sync[bot] commented 1 week ago

Shelly Wise commented: QA review verified e2e test running and passing successfully.

!image-20240808-151739.png|width=1825,height=1029,alt="image-20240808-151739.png"!

QA Review Completed. Moved to Stage Ready.

exalate-issue-sync[bot] commented 1 day ago

Sprint accepted by Paul Clark during sprint review on 8/20/2024.