divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
52 stars 14 forks source link

`test_util::kubernetes::tests::create_clusters` occasionally hangs #407

Closed branlwyd closed 2 years ago

branlwyd commented 2 years ago

This occurs on my workstation via a normal cargo test invocation with no special flags. Appears to be flaky: terminating the tests and trying again often succeeds.

The only console output is:

test test_util::kubernetes::tests::create_clusters has been running for over 60 seconds
divergentdave commented 2 years ago

Have you bumped up your inotify limits on this machine? I think I recall seeing weird things in the Docker logs for Kind before doing so, when I tried to set up a second or third Kind cluster.

branlwyd commented 2 years ago

I saw this again this morning, directly after doing a docker system prune. I think at least one cause of the hang is just downloading the kindest/node container again, which is required(?) to run a Kind cluster & is 911MB on my workstation. In this case, I just waited for a while and the test eventually finished after a few minutes.

I'm going to pay attention to whether I see hangs when that image is still in my local Docker. If this issue only occurs for me when it's not, I'm going to chalk it up to my Internet connection. If I see hangs when the image is already locally downloaded, I'm going to try bumping inotify.

tgeoghegan commented 2 years ago

The kindest/node image is needed to run the tests, so it's plausible that's what was causing your test to hang.

This issue tracks a specific problem with a long-running test, but relatedly, we should consider if we need or want EphemeralCluster in janus. The only reason it exists is so that we can run a test or two that works with Kubernetes secrets in janus_cli. Otherwise, we ought to only have to deal with Kind over in janus-ops. Maybe there's a more lightweight solution to mocking out the Kubernetes API that we can use for testing?

branlwyd commented 2 years ago

Closing this; AFAICT, I was simply too impatient waiting for the kindest/node image to download.