Open def- opened 1 year ago
Just before that line:
zippy-materialized-1 | cluster-u6-replica-8: 2023-07-13T13:10:30.712636Z WARN mz_cluster::communication: failed to initialize network: Resource temporarily unavailable (os error 11) process=3
os error 11 is EAGAIN
, which we're apparently not handling.
This causes then a panic within Timely, which will take down the whole process. The orchestrator restarted the process, so it seems we ended up in a good state.
Closing because the root cause seems to be an OS issue, and we're handling it OK.
If the preferred way to handle EAGAIN is to exit the process, it should not happen with a panic that is then reflected in the CI and Sentry, but with an orderly non-panic exit. So I am re-opening the ticket until the panic can be silenced properly.
Idea: If this uses unmanaged replicas, convert to using managed replicas.
Adding ci-regexp: panicked at 'failed to send MergeQueue: "SendError(..)"'
here since https://github.com/MaterializeInc/materialize/issues/22027 is marked as a duplicate of this issue.
This occurred in https://buildkite.com/materialize/tests/builds/66628#018b5d1d-0e35-494c-ac20-fd28a1f81c79.
Happened in the normal CI as well https://buildkite.com/materialize/tests/builds/70500#018c3b75-1685-4bb8-b5a6-ec532814014c
This was observed in the release-qualification build:
zippy-storaged-1 | thread 'thread 'timely:work-0timely:work-1' panicked at ' panicked at /cargo/git/checkouts/timely-dataflow-70b80d81d6cabd62/89bcb73/communication/src/allocator/process.rs/cargo/git/checkouts/timely-dataflow-70b80d81d6cabd62/89bcb73/communication/src/allocator/process.rs::4339::4033:
zippy-storaged-1 | Failed to recv buzzer: RecvError:
zippy-storaged-1 |
zippy-storaged-1 | Failed to send buzzer: "SendError(..)"
zippy-storaged-1 | stack backtrace:
What version of Materialize are you using?
b83440d41865e38a6352bbb7db30b4786f264028
What is the issue?
Seen in https://buildkite.com/materialize/nightlies/builds/2820#01894f2b-22d0-4414-acec-d121a0aab403
I think this is unrelated to my CRDB upgrade change in which it occurred: https://github.com/MaterializeInc/materialize/pull/20507 Retriggered the run to make sure, but I'm expecting this to be a flake: https://buildkite.com/materialize/nightlies/builds/2824