Open def- opened 1 month ago
Seen again in CI.
I'll try to repro locally. Edit: can be reproduced, but took a while: while true; do bin/mzcompose --find replica-isolation down && bin/mzcompose --find replica-isolation run default || break; done
. Now trying with the new Mz restart logic reverted. Edit2: Still happens, so unrelated to that. Now trying with https://github.com/MaterializeInc/materialize/pull/28380 and the commit just before it merged. Edit3: Reproduced on #28380 and never on the state before it, but will keep it running for a few more hours.
Before #28380 I got this failure instead:
replica-isolation-materialized-1 | environmentd: 2024-08-15T13:36:34.862548Z INFO mz_compute_client::controller::replica: error connecting to replica, retrying in 1s: transport error: dns error: failed to lookup address information: Temporary failure in name resolution: dns error: failed to lookup address information: Temporary failure in name resolution: failed to lookup address information: Temporary failure in name resolution replica=User(2)
Apparently clusterd had a crash of sorts:
replica-isolation-clusterd_1_2-1 | 2024-08-15T13:34:56.769911Z WARN mz_timely_util::panic: halting process: timely communication error: reading data: Connection reset by peer (os error 104)
So something was weird there too, but never this panic. services.log
That looks like https://github.com/MaterializeInc/materialize/issues/28046, one of the two issues that #28380 was intended to fix.
What version of Materialize are you using?
eddd6923419b (Pull Request #28996)
What is the issue?
Seen in Replica isolation
ci-regexp: someone claimed to be us