loadgen queries get stuck in testdrive: snapshot not yet available

def- commented 2 weeks ago

What version of Materialize are you using?

448764fc5afe

What is the issue?

Seen in https://buildkite.com/materialize/test/builds/84424#019049e5-b374-427b-976f-023030ecc7cc

testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.837662Z  WARN mz_storage_client::storage_collections: error during background finalization of shard sabdf4166-c94a-4c88-8f5a-99b73455bffb: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.839515Z  WARN mz_storage_client::storage_collections: error during background finalization of shard sd47481c5-2dbf-4dcf-ae23-591f1fd98bd4: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.839724Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s2859b541-9fda-45d6-bc7e-71b7b640b051: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.839819Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s1db8d353-3d27-4b5e-b08b-b010d8bf4a21: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.840520Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s2a6252d3-e0ef-4ac0-97b2-207767b3e784: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.841947Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s393f42dd-36ed-4ec2-b82c-56451c6cb914: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.848361Z  WARN mz_storage_client::storage_collections: error during background finalization of shard saca2372d-1855-4e26-b602-274948a8215f: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.849750Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s7f6f1d2e-5e55-4515-b52f-7ed35b15c0ba: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }
testdrive-materialized-1          | environmentd: 2024-06-24T11:38:10.853294Z  WARN mz_storage_client::storage_collections: error during background finalization of shard s23f4bf9b-cfd2-4608-8111-26c5abbb6184: FinalizationError { since: Antichain { elements: [0] }, upper: Antichain { elements: [] } }

The test stays stuck for 30 min and never recovers until it times out. The query is stuck in Mz for 30 min too:

2024-06-24 11:30:33 UTC > SELECT
2024-06-24 11:30:33 UTC   s.name,
2024-06-24 11:30:33 UTC   u.offset_committed = u.offset_known,
2024-06-24 11:30:33 UTC   u.snapshot_records_known,
2024-06-24 11:30:33 UTC   u.snapshot_records_staged,
2024-06-24 11:30:33 UTC   u.messages_received > 48,
2024-06-24 11:30:33 UTC   u.records_indexed
2024-06-24 11:30:33 UTC FROM mz_sources s
2024-06-24 11:30:33 UTC JOIN mz_internal.mz_source_statistics u ON s.id = u.id
2024-06-24 11:30:33 UTC WHERE s.name IN ('up_with_update')
2024-06-24 11:30:37 UTC rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms
2024-06-24 11:30:37 UTC rows match; continuing at ts 1719228637.6596959
2024-06-24 11:30:37 UTC > SELECT
2024-06-24 11:30:37 UTC   encode(value, 'base64') != 'jVfAtykQHFlCMA=='
2024-06-24 11:30:37 UTC FROM up_with_update
2024-06-24 11:30:37 UTC WHERE
2024-06-24 11:30:37 UTC   key = 14
2024-06-24 12:03:07 UTC rows didn't match; sleeping to see if dataflow catches up 50ms# Received cancellation signal, interrupting

There were large changes to the relevant code in https://github.com/MaterializeInc/materialize/pull/27057, so I'm not sure if this PR is related or not.

aljoscha commented 2 weeks ago

The log about finalization should be unrelated to the hung query, plus I think that's a think that would already occur before, it's just that I added logging for it in https://github.com/MaterializeInc/materialize/pull/27057.

The failure might be related to another recent change of mine, though: https://github.com/MaterializeInc/materialize/pull/27496. Though that could also be a red herring.

Do we know what the query returns? From that log snippet it looks like we do our regular retries but never see what we want, as opposed to the query just not returning, right?

def- commented 2 weeks ago

I think the query returned a result once, but it was the wrong result. Then the second try hangs.

Why do we expect that much logging? It's spamming services.log in CI.

def- commented 2 weeks ago

The only other relevant logging I see is:

testdrive-materialized-1          | cluster-u1-replica-u1-gen-0: 2024-06-24T11:30:38.769686Z  INFO mz_persist_client::internal::machine: snapshot sceed0578-efae-456b-8a30-d4e0dfbbd885 as of [1719228636768] not yet available for v15 upper [1719228636001]
testdrive-materialized-1          | cluster-u34-replica-u36-gen-0: 2024-06-24T11:38:03.047972Z  WARN mz_persist_client::internal::machine: heartbeat task for reader (r2e3de075-faa6-438a-a9f2-df60a78e6721) of shard (s69269886-7262-4537-a4ed-2b55eb7a4b66) exiting due to expired lease while read handle is live
testdrive-materialized-1          | cluster-u34-replica-u36-gen-0: 2024-06-24T11:38:03.048097Z  WARN mz_persist_client::internal::machine: heartbeat task for reader (r2e3de075-faa6-438a-a9f2-df60a78e6721) of shard (s69269886-7262-4537-a4ed-2b55eb7a4b66) exiting due to expired lease while read handle is live
testdrive-materialized-1          | cluster-u34-replica-u36-gen-0: 2024-06-24T11:38:03.073256Z  WARN mz_persist_client::internal::machine: heartbeat task for reader (r2548ce70-3e72-4b97-a48d-bac9e81aa61e) of shard (s017ca4a9-0bb4-4ac4-b981-2b62459066e1) exiting due to expired lease while read handle is live
testdrive-materialized-1          | cluster-u34-replica-u36-gen-0: 2024-06-24T11:38:03.073380Z  WARN mz_persist_client::internal::machine: heartbeat task for reader (r2548ce70-3e72-4b97-a48d-bac9e81aa61e) of shard (s017ca4a9-0bb4-4ac4-b981-2b62459066e1) exiting due to expired lease while read handle is live

def- commented 2 weeks ago

Another occurrence in a different testdrive test: https://buildkite.com/materialize/test/builds/84505#01904b67-1af9-454b-8118-b14f953ea5aa Edit: again: https://buildkite.com/materialize/test/builds/84526#01904cb6-156c-4e63-9255-65a9a4ee00c1

Again the only relevant logging other than background finalization errors:

testdrive-materialized-1          | cluster-u1-replica-u1-gen-0: 2024-06-25T00:11:13.370574Z  INFO mz_persist_client::internal::machine: snapshot sdc9c8941-19c0-483d-9188-d823dff454ac as of [1719274271519] not yet available for v37 upper [1719274271001]

This seems to be a recent regression.

def- commented 2 weeks ago

I'm marking this as a release blocker since it's occurring often in CI, seems to be a recent regression. I'll try to get a reproducer locally and then can hopefully figure out more: while true; do BUILDKITE_PARALLEL_JOB="6" BUILDKITE_PARALLEL_JOB_COUNT="8" bin/mzcompose --find testdrive run default || break; done

Edit: I couldn't reproduce this yet. One idea for what could happen is that Mz is just overloaded, but this should not happen because we clear most resources between individual td files.

def- commented 2 weeks ago

Got it reproduced locally with the same command, Mz is now stuck:

> CREATE SOURCE up_with_update
IN CLUSTER lg_cluster
FROM LOAD GENERATOR KEY VALUE (
  KEYS 16,
  PARTITIONS 4,
  SNAPSHOT ROUNDS 3,
  SEED 123,
  VALUE SIZE 10,
  BATCH SIZE 2,
  TICK INTERVAL '1s'
)
ENVELOPE UPSERT
rows match; continuing at ts 1719303066.6064289
> SELECT partition, count(*) FROM up_with_update GROUP BY partition
rows match; continuing at ts 1719303067.212674
> SELECT
  encode(value, 'base64') != 'jVfAtykQHFlCMA=='
FROM up_with_update
WHERE
  key = 14
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s

clusterds are stuck at using full CPU: Screenshot 2024-06-25 at 10 43 50

materialize=> select 1;
 ?column?
----------
        1
(1 row)

materialize=> explain SELECT
  encode(value, 'base64') != 'jVfAtykQHFlCMA=='
FROM up_with_update
WHERE
  key = 14;
                      Optimized Plan
----------------------------------------------------------
 Explained Query:                                        +
   Project (#3)                                          +
     Filter (14 = bigint_to_numeric(#0))                 +
       Map (("jVfAtykQHFlCMA==" != encode(#2, "base64")))+
         ReadStorage materialize.public.up_with_update   +
                                                         +
 Source materialize.public.up_with_update                +
   filter=((14 = bigint_to_numeric(#0)))                 +
                                                         +
 Target cluster: mz_system                               +

(1 row)

materialize=> select * from mz_internal.mz_source_statuses;
  id   |          name           |      type      |   last_status_change_at    | status  | error | details
-------+-------------------------+----------------+----------------------------+---------+-------+---------
 u4159 | up_with_update_progress | progress       |                            | running |       |
 u4160 | up_with_update          | load-generator | 2024-06-25 08:11:06.957+00 | running |       |
(2 rows)

Since this always fails with load generators, maybe this shouldn't be a release blocker?

Full gdb backtraces: https://gist.github.com/def-/a3ebfe3d46a31c945bf27c5598a66a15

aljoscha commented 2 weeks ago

Does it repro when just running these commands? Or you need to run testdrive in a loop for a bit?

def- commented 2 weeks ago

Needed the loop for an hour or more. I couldn't find an easier way to repro it. (Tried while true; do bin/mzcompose --find testdrive run default load-generator-key-value.td || break; done unsuccessfully for a while)

aljoscha commented 2 weeks ago

I'm un-assigning for now because this looks like an issue with the load generator source, right?

def- commented 2 weeks ago

It seems so, reassigning to Storage team.

rjobanp commented 1 week ago

FWIW, I dont think this should be a release-blocker if this only applies to the key-value load gen source. That is behind a feature-flag that is not currently enabled in production https://app.launchdarkly.com/projects/default/flags/enable_load_generator_key_value/targeting?env=production&env=staging&env=development&env=loadtest&env=ci-cd&selected-env=production

petrosagg commented 1 week ago

I re-added the release blocker as we don't have evidence yet that this is key-value load gen specific. I inspected the stacktraces captured by @def- and you can see that the only threads that are not parked or in a syscall are four timely threads that all have this stack:

Thread 74 (LWP 49592 "timely:work-3"):
#0  0x0000601daf49e8ce in mz_storage::upsert::upsert_inner::{closure#3}::{async_block#0}::{closure#3}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::sco
pes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::S
ubtime)>, mz_storage_types::sources::MzOffset, mz_storage::upsert::upsert::{closure_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child:
:Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, m
z_storage_types::sources::MzOffset>, mz_storage::upsert::upsert::{closure#0}::{async_block_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes:
:child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subti
me)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::rocksdb::RocksDB<core::option::Option<mz_storage_types::sources::MzOffset>>> (cx=0x73e9f7ff09c0) at /cargo/registry/src/index.crates.io-6
f17d22bba15001f/tokio-1.32.0/src/macros/select.rs:510
#1  tokio::future::poll_fn::{impl#1}::poll<mz_storage::upsert::upsert_inner::{closure#3}::{async_block#0}::__tokio_select_util::Out<core::option::Option<mz_timely_util::builder_async::Event<(mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime), timely::dataflow::operators::capability::Capability<(mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, alloc::vec::Vec<((mz_storage::upsert::UpsertKey, core::result::Result<mz_repr::row::Row, mz_storage_types::errors::UpsertError>), (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime), i64), alloc::alloc::Global>>>, core::option::Option<mz_timely_util::builder_async::Event<(mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime), timely::dataflow::operators::capability::Capability<(mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, alloc::vec::Vec<((mz_storage::upsert::UpsertKey, core::option::Option<core::result::Result<mz_repr::row::Row, mz_storage_types::errors::UpsertError>>, mz_storage_types::sources::MzOffset), (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime), i64), alloc::alloc::Global>>>>, mz_storage::upsert::upsert_inner::{closure#3}::{async_block#0}::{closure_env#3}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset, mz_storage::upsert::upsert::{closure_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::upsert::{closure#0}::{async_block_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::rocksdb::RocksDB<core::option::Option<mz_storage_types::sources::MzOffset>>>> (cx=0x73e9f7ff09c0, self=...) at /cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/future/poll_fn.rs:58
#2  mz_storage::upsert::upsert_inner::{closure#3}::{async_block#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset, mz_storage::upsert::upsert::{closure_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::upsert::{closure#0}::{async_block_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::rocksdb::RocksDB<core::option::Option<mz_storage_types::sources::MzOffset>>> () at src/storage/src/upsert.rs:707
#3  0x0000601dae68be02 in core::future::future::{impl#1}::poll<alloc::boxed::Box<mz_storage::upsert::upsert_inner::{closure#3}::{async_block_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset, mz_storage::upsert::upsert::{closure_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::upsert::{closure#0}::{async_block_env#0}<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::dataflow::scopes::child::Child<timely::worker::Worker<timely_communication::allocator::generic::Generic>, ()>, mz_repr::timestamp::Timestamp>, (mz_repr::timestamp::Timestamp, mz_storage_operators::persist_source::Subtime)>, mz_storage_types::sources::MzOffset>, mz_storage::upsert::rocksdb::RocksDB<core::option::Option<mz_storage_types::sources::MzOffset>>>, alloc::alloc::Global>> (self=..., cx=0x73e9f7ff09c0) at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123

The stack trace points to this line https://github.com/MaterializeInc/materialize/blob/448764fc5afe15e07ab4aeda41691a5ec3a0f855/src/storage/src/upsert.rs#L707 which suggests that there are inputs with which the upsert operator can get in this situation.

petrosagg commented 1 week ago

Another curious thing visible in Dennis' screenshot is that there are four timely workers, which in theory are single threaded, consuming the entirety of 16 cores. This must mean that the thing we're busy working must be some work scheduled in a threadpool. rocksdb is enabled in that test so it could be that the rocksdb threadpool is stuck somewhere. I don't see the rocksdb threads being busy in the backtraces though..

rjobanp commented 1 week ago

Hmm this reminds me -- I did enable the storage_rocksdb_use_merge_operator flag in production last week but it's still disabled by default in the code so I think these CI runs would still be running with the flag turned off, unless there is some LD -> CI configuration I'm not aware of

petrosagg commented 1 week ago

I see this in the logs rocksdb_use_native_merge_operator=falsewhich I assume means it's not enabled, right?

rjobanp commented 1 week ago

correct, not enabled, so likely unrelated then

def- commented 1 week ago

The reproducer I had before didn't work a second time. I got into stuck state again like this:

ALTER SYSTEM SET storage_statistics_collection_interval = 1000;
ALTER SYSTEM SET storage_statistics_interval = 2000;
ALTER SYSTEM SET enable_load_generator_key_value = true;
DROP CLUSTER lg_cluster2 CASCADE;
CREATE CLUSTER lg_cluster2 SIZE '4-4';
CREATE SOURCE up_no_update
IN CLUSTER lg_cluster2
FROM LOAD GENERATOR KEY VALUE (
  KEYS 16,
  PARTITIONS 4,
  SNAPSHOT ROUNDS 3,
  SEED 123,
  VALUE SIZE 10,
  BATCH SIZE 2
)
ENVELOPE UPSERT;

CREATE SOURCE up_quick
IN CLUSTER lg_cluster2
FROM LOAD GENERATOR KEY VALUE (
  KEYS 16,
  PARTITIONS 4,
  SNAPSHOT ROUNDS 3,
  TRANSACTIONAL SNAPSHOT false,
  SEED 123,
  VALUE SIZE 10,
  BATCH SIZE 2
)
INCLUDE KEY AS whatever
ENVELOPE UPSERT;

SELECT partition, count(*) FROM up_no_update GROUP BY partition;
SELECT MAX(key) FROM up_no_update;
SELECT partition, count(*) FROM up_quick GROUP BY partition;
SELECT MAX(whatever) FROM up_quick;;
SELECT
  s.name,
  u.offset_known,
  u.offset_committed,
  u.snapshot_records_known,
  u.snapshot_records_staged,
  u.messages_received,
  u.records_indexed
FROM mz_sources s
JOIN mz_internal.mz_source_statistics u ON s.id = u.id
WHERE s.name IN ('up_no_update', 'up_quick');

SELECT
    encode(value, 'base64')
  FROM up_no_update
  WHERE
    key = 14;

SELECT
    encode(value, 'base64')
  FROM up_quick
  WHERE
    whatever = 14;

ALTER CLUSTER lg_cluster2 SET (REPLICATION FACTOR 0);
ALTER CLUSTER lg_cluster2 SET (REPLICATION FACTOR 1);
SELECT
    encode(value, 'base64') = '${pre-rehydration}'
  FROM up_no_update
  WHERE
    key = 14

> SELECT
    encode(value, 'base64') = '${pre-rehydration-quick}'
  FROM up_quick
  WHERE
    whatever = 14

SELECT
    s.name,
    u.offset_known,
    u.offset_committed,
    u.snapshot_records_known,
    u.snapshot_records_staged,
    u.messages_received,
    u.records_indexed
  FROM mz_sources s
  JOIN mz_internal.mz_source_statistics u ON s.id = u.id
  WHERE s.name IN ('up_no_update', 'up_quick');

DROP SOURCE up_no_update;
DROP SOURCE up_quick;

CREATE SOURCE up_with_update2
  IN CLUSTER lg_cluster2
  FROM LOAD GENERATOR KEY VALUE (
    KEYS 128,
    PARTITIONS 4,
    SNAPSHOT ROUNDS 2,
    SEED 123,
    VALUE SIZE 10,
    BATCH SIZE 4,
    TICK INTERVAL '1s'
  )
  ENVELOPE UPSERT;

SELECT partition, count(*) FROM up_with_update2 GROUP BY partition;

SELECT
    s.name,
    u.offset_known > 3,
    u.offset_committed = u.offset_known,
    u.snapshot_records_known,
    u.snapshot_records_staged,
    u.messages_received > 48,
    u.records_indexed
  FROM mz_sources s
  JOIN mz_internal.mz_source_statistics u ON s.id = u.id
  WHERE s.name IN ('up_with_update2');

SELECT
    encode(value, 'base64') != '${pre-rehydration}'
  FROM up_with_update2
  WHERE
    key = 14;

SELECT
    encode(value, 'base64')
  FROM up_with_update2
  WHERE
    key = 14;

SELECT
    encode(value, 'base64') != '${pre-rehydration-with-update}'
  FROM up_with_update2
  WHERE
    key = 14;

And then running this in an endless loop:

SELECT encode(value, 'base64') != 'jVfAtykQHFlCMA==' FROM up_with_update2 WHERE key = 14;

MaterializeInc / materialize

loadgen queries get stuck in testdrive: snapshot not yet available #27835

What version of Materialize are you using?

What is the issue?